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MicroUnity's Terpsichore System Architecture describes general-purpose 
processor, memory, and interface subsystems, organized to operate at enormously 
higher bandwidth rates than traditional computers. 

Terpsichore's Euterpe processor performs integer, floating point, and signal 
processing operations at data rates up to 512 bits (i.e., up to four 128-bit operand 
groups) per instruction. The instruction set design carries the concept of 
streamlining beyond Reduced Instruction Set Computer (RISC* architectures 
since it targets implementations that issue several instruction!; < pit machine cycle ' 


The Terpsichore memory subsystem provides 
and 64 -bit physical addressing for DO 
environments. Caches supply the hif 
processor, and support coherency ptfm^tives 
memory subsystem includes mecf — : ^ 
block transfer modes, but " 
patterns. 


Hermes channels provi< 
gigabyte-per-second 
flexible, robust ag.j 
configuration, 
devices provid< 
components 



virtual addressing 
^er advanced OS 
Sue rates of the 
^processors. The 
fata rates not only in 
scatter/gather access 


!th. Terpsichore^ Ccrbe 
xpenwe mechanism to 
bili ) ud roi i covei y 

je nun^" 5 

sys'l 




Terpsichore's C 
and memory, 
audio, netwr 
of memorj 
or dedigal 


ystS|m components with 
jefial bus provides a 
system initialization, 
ibsvne memory interface 
idp^y-standard memory 


subrynem is tightly integrated with the processor 
- bari|wiith and%e^al-time response needs of video, 
ices. Integration provides for the sharing 
l? e ^^es^^ine processor, without distributed 
?e*adapter. 


^incorporates Icarus interprocessor interfaces for 
Jiy of %*all-scale, coherently-cached, shared-memory multiprocessors, 
iout additional circuitry. Icarus interfaces may also be used to connect 
arpsichore processors to a high-performance switching fabric for large-scale 
multiprocessors, or to adapters to standard interprocessor interfaces, such as 
Scalable Coherent Interface- (IEEE standard 1596-1992). 

The goal of the Terpsichore architecture is to integrate these processor, memory 
and interface capabilities with optimal simplicity and generality From the 
software perspective, the entire machine state consists of a program counter a 
single bank of 64 general-purpose 64-bit registers, and a linear byte-addressed 
shared memory space with mapped interface registers. All interrupts and 
exceptions are precise, and occur with such low overhead that even cache misses 
can be handled as exceptions under software control. 

This document is intended for Terpsichore software and hardware developers 
alike, and defines the interface at which their designs must meet. Terpsichore 
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pursues the most efficient tradeoffs between hardware and software complexity 
by making all processor, memory, and interface resources directly accessible to 
high-level language programs. 


1/ V 


is*? 


mCW ...V. C# ; %; : ^ * "IC* 
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Conformance* 

To ensure that Terpsichore systems are able to freely interchange data, user-level 
programs, system-level programs and interface devices, the Terpsichore system 
architecture reaches above the processor level architecture. 

Mandato ry and Optional Areas 

A computer system conforms to the requirements of the 
Architecture if and only if it implements all the specificat/ 
document and other specifications included by reference; 
specification is mandatory in all areas, including the^finl 
management system, interface devices and externdgj 
ROM functional requirements, except where explicit ppf 

Optional areas include: 

Number of processor 
Size of first-level cac n^^P^rie^^ 
Existence of a second -level cache 
Size of second-le d c u he mefyoty \ 
Size of syst. )-] elmemoi ^ V s 
Existence of certain optional interlace de- 



plore System 
scribed in this 
Formance to the 
iction set, memory 
and bootstrap 
ited. 



Conformance 
implementatii 
architectur< 
interprocessor 
modify or eliminS|< 
is unchanged ^C"/ 


irding the physical 
^ }At of||h|^Cerberus serial bus 
lannel .-m;h> lecture, and the Icarus 
An kppk mentation may replace, 
tb^t th§?software-level functionality 


g - .< w --/ may Modify the architecture in an upward- 

Wj £ mll ^ er ' such as b y the addition of new instructions, definition of 
;ved bits in system state, or addition of new standard interfaces. Such 
-Edifications will be added as options, so that designs which conform to this 
version of the architecture will conform to future, modified versions. 

Additional devices and interfaces, not covered by this standard may be added in 
specified regions of the physical memory space, provided that system reset places 
these devices and interfaces in an inactive state that does not interfere with the 
operation of software that runs in any conformant system. The software interface 
requirements of any such additional devices and interfaces must be made as 
widely available as this architecture specification. 
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Unrestricted Physical Implementation 

Nothing in this specification should be construed to limit the implementation 
choices of the conformant system beyond the specific requirements stated herein. 
In particular, a computer system may conform to the Terpsichore System 
Architecture while employing any number of components, dissipate any amount 
of heat, require any special environmental facilities, or be of any physical size. 

Draft Version 

This document is a draft version of the architectural s] 
conformance to this document may not be claimed ot» 
change this specification at any time, in any manne%a 
final. When this document has been decliced fir^ # ir| 
correct bugs, defects or deficiencies^ndSo ^add^upwarti 
extensions. 


Wpon. In this form, 
|eH. MicroUnity may 
it, has been declared 
y' changes will be to 
ipatible optional 
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Overview 

Notation 

The descriptive notation used in this document is summarized in the table below: 


two's complement or floating-point addition of x and y 


two's complement or floating-point subtraction of y, from x 


x/y 


two's complement or floating-point multiplication^ 


two's complement or floating-point division of 



x = y 


two's complement or floating-point eqw 
between x and y. Result is a single bit. 


x*y 


two's complement or floating-point 
between x and y, Result igjksingk 


Ji^ordering of bits in this document is always little-endian, regardless of the 
offering of bytes within larger structures. 

Instruction mnemonics are usually written with periods (.) separating elements of 
the mnemonic to make them easier to understand. Terpsichore assemblers and 
other code tools treat these periods as optional; the mnemonics are designed to be 
parsed either with or without the periods. , — 


Representation mu 0023227 

Terpsichore memory is byte-addressed, using either little-endian or big-endian 
byte ordering. The selection of byte ordering is dynamic, so that little-endian and 
big-endian processes, and even data structures within a process, can be 
intermixed on the processor. Terpsichore provides eight-byte (64-bit) virtual 
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address, physical address, and data path sizes, and uses fixed-length four-byte 
(32-bit) instructions. Arithmetic is performed on two's-complement or unsigned 
binary and ANSI/IEEE standard 754-1985 conforming binary floating-point 
number representations. 

Memory 

Memory is an array of bytes, without a specified byte ordering. 


byteO 


byte 1 


byte 2 


- ■ bytey" 


A3 


\ 


Terpsichore memory, incindtng memory-mapped registers, must conform to the 
following requir ; |^|^#e^a^^ of^^ fnoa^^erations: 

A memory lead musf have" no si^e-ef|ects on "the, ctaitems of the addressed 
memory nor on the cdWtehts^^ny o^^^^>ry.,/j 

Memory wnm/stbfe sSm§t)tic^^^ 3y 


Terpsichojre^mj&mQry, i^ctudkig-% 
following" r'^quire^ei^s regaping^K-di 


f registers, must conform to the 
%[ read or load operations: 


A ^»$npry ^«lit# must have^no side-effects on the contents of the addressed 
.iffipriesty. A memory write may cause side-effects on the contents of memory not 
Mdtessed by the write operation, however, a second memory write of the same 
value to the same address must have no side-effects on any memory; memory 
write operations must be idempotent. 

Euterpe store instructions which are weakly ordered may have side-effects on the 
contents of memory not addressed by the store itself; subsequent load instructions 
which are also weakly ordered may or may not return values which reflect the 
side-effects. 
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Fixed-point ilRta <: . 
BvtQ 

A byte is a single element of the memory array: 
7 

r 


byte 


Larger data structures are constructed from the concatenat|plJM^iytes in either 
little-endian or big-endian byte ordering. A memory access of- % data structure of 
size s at address i is formed from memory bytes at ajd%ses i through i+s-1. 

i of alignment: it is not 


memory clock cycle than unaligned -^^^es.^ 

With little-endian byte ordermg^ffg^te^ 
. -is 


. »^ r , , ... | by »eU1 | b ^ TZJ 


With big-endian s %Korderjni5he bytes an traced as 

s'8-1 - V s-8 s'S-3 C--8-1B 7 Q 


byte i+s-1 | 


Doublet 

A doublet is the concatenate 
' 15 


doublet 


A quadlet is the concatenation of four bytes: 


r 


quadlet 
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Octlet 

An octlet is the concatenation of eight bytes: 
63 


octtet 6 3..32 


octlet 3 1..0 V> 


A hexlet is the concatenation of sixteen byf|s: 
127 


32 


hexiet63..32 


Addas '* 


Te<focfere ^@bsses ate qiirft, octle^ or hexlet quantities, depending on the 
lewl oi tl c implc i icntation. 

latta^j^ MU 0023230 

Terpsichore's floating-point formats are designed to satisfy ANSI/IEEE standard 
754-1985: Binary Floating-point Arithmetic. Standard 754 leaves certain aspects to 
the discretion of the implementor: 

Terpsichore adds additional half-precision and quad-precision formats to 
standard 754's single-precision and double-precision formats. Terpsichore's 
double-precision satisfies standard 754's precision requirements for a single- 
extended format, and Terpsichore's quad-precision satisfies standard 754's 
precision requirements for a double-extended format. 
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Quiet NaN values are denoted ;by any sign bit value; an exponent field of all one 
bits, and a non-zero significand with the most significant bit cleared. Quiet NaN 
values generated by default exception handling of standard operations have a zero 
sign bit, an exponent field of all one bits, and a significand field with the most 
significant bit cleared, the next-most significant bit set, and all other bits cleared. 

Signaling NaN values are denoted by any sign bit value, an exponent field of all 
one bits, and a non-zero significand with the most significant bit set. 

Half-precision Floating-point ^ 


Terpsichore half precision uses a format similar to standirlftflVs requirements, 
reduced to a 16-bit overall format. The format contaiaf Itfficient precision and 
exponent range to hold a 12-bit signed integer. 

15 14 10 9,- 

fsignjexponentl sfj 
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Quad-prec ision Floating-point 

Terpsichore quad precision satisfies standard 754 's requirements for "double 
extended," but has additional significand precision to use 128 bits. 

127 126 112 111 96 

tsign| exponent | significant 1 1.. 96 f 


significances.. 64 


;ignificancl63..32 Z 


si 


_ - 


Instruction -A . s 

A Terpsichore instruction is specifically defined as. a four-byte structure with the 
ordering shown beUv . ll is different from the quadlci defined above because the 


placement >f "nsrm»:{.u^& into memory must be independent tn the byte ordering 
used for data scrorrures. Instructions must be aligned on Four-byte boundaries: in 
the diagram beio"^ i myst-be : a multiple yf 4. 

31 24 23 16 15 6 7 0 

I bv^'K Iv^eM J b^2 | bytei+3 | 

^ 8 


„ confident 
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Gateway • 

A Terpsichore gateway is specifically defined as a 16-byte structure with the 
ordering shown below. A gateway contains a code address used to invoke a 
procedure at a higher privilege level securely. Gateways are marked by protection 
information specified in the TLB. Gateways must be aligned on 16-byte 
boundaries, that is, in the diagram below, i must be a multiple of 16. 

127 120 119 112 111 104 103 96 

I byte i | byte i+1 | byte i+2 | byte i +3 | 


byte i+4 I byte i+5 j byte indO^P byte i+7 | 

8 8 ^| 8-"^; -fU r 

63 -56 55 .46-4/ -\ 4D 39- • 32 

t bytei+8 | byte f- JbyteJ*1Q ^ bytei+11 | 

3J 24 23 16 15 6; 0 

I byte i+12 | byte i+13 HT byte ^14 I % byte i+15 1 


The gateway contau* a code address, eight-aligned within the 128 bit structure: 



The user, state consists of, hardware dafa structures that are accessible to all 
^d cod^Tfce Terpsichore user state is designed to be as 
^ as P^|M e > and colfists only v of the general registers, the program 
rer, and virtual memory. There are no specialized registers for condition 
?s, operating modes, rounding modes, integer multiply/divide, or floating-point 
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General Registers 

Terpsichore user state includes 64 general registers. All are identical; there is no 
dedicated zero-valued register, and there are no dedicated floating-point registers. 


REGTOj 


REG[62] jN_ 


Pro gram Counter 

The program counter contains the add&ss^f tke.^|^j'entl^^executing instruction. 
This register is implicitly nua^i|)%Mted%|%ran4|i ^tnictMnf, and read by branch 
instructions that save attorn addrew. ui i gem :;\ register. t> 

127 2 "3 

| ProgramCounter |0| 

The program counter ittav be implemented J>0, 62, or 126 bits, depending upon 
the level of implementatiorjc.-;\T)ny i] len - nted bil are aiways zero. 

Privilege Mel?®} <^SkT ifis^" 

The pcM'^gjg lev$ r%gister^ntafe^h#r)%i^ege level of the currently executing 
'. This register is implicitly manipulated by branch gateway and branch 
rac%tfi, and read^if branch ^gateway instructions that save a return 
iiral register. 

1 0 

Ml 

2 

System state 

The system state consists of the facilities not normally used by conventional 
compiled code. These facilities provide mechanisms to execute such code in a fully 
virtual environment. All system state is memory mapped, so that it can be 
manipulated by compiled code. 
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Fixed-point 

Terpsichore provides load and store instructions to move data between memory 
and the registers, branch instructions to compare the contents of registers and to 
transfer control from one code address to another, and arithmetic operations to 
perform computation on the contents of registers, returning the result to registers. 

Load and Store 

The load and store instructions move data between memor' 
When loading data from memory into a register, values arej^ 
extended to fill the register. When storing data frornArej 
values are truncated on the left to fit the specified memp% rt 



? the registers, 
attended or sign- 
ter into memory, 


Load and store instructions that speci 
may use either little-endian or big-en/ 
explicitly specified in the instruction 
aligned to addresses that are i 
unspecified alignment: ahgnmebf- 
instruction. -^oSS^ 

The load and store i 
point and digital sij 
for all data types 


Swap instruct 
indivisable opi 
swap. A store-mj 
portion of an 
either little- 


r re than one byte 
us. and ordering are 
y«c byte may be either 
Ifrpze of the region, or of 
:xplj|itly specified in the 

. well as floating- 
lgle bank of registers 


'richronization, using 
!p, and multiplex-and- 
:o indivisably write to a 
•ate on aligned octlet data, using 


f -< oint t.\iu Dare-anabranch instructions provide all arithmetic tests for 
v _ and Inequality of signed and unsigned fixed-point values. Tests are 
- ^ ^ rmec * either between two operands contained in general registers, or on the 
bitwise and of two operands. Depending on the result of the compare, either a 
branch is taken, or not taken. A taken branch causes an immediate transfer of the 
program counter to the target of the branch, specified by a 12-bit signed offset 
from the location of the branch instruction. A non-taken branch causes no 
transfer; execution continues with the following instruction. , 

Branch Unconditionally 


MU 0023235 


Other branch instructions provide for unconditional transfer of control to 
addresses too distant to be reached by a 12-bit offset, and to transfer to a target 
while placing the location following the branch into a register. The branch through 
gateway instruction provides a secure means to access code at a higher privilege 
level, in a form similar to a normal procedure calL > - 
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Arithmetic Operations 

The fixed-point arithmetic operations include add, subtract, multiply, divide, 
shifts, and set on compare, all using octlet-sized operands. Multiply and divide 
operations produce hexlet results; all other operations produce ocdet results. 

When specified, add, subtract, and shift operations may cause a fixed-point 
arithmetic exception to occur on resulting conditions such as signed overflow, or 
signed or unsigned equality or inequality to zero. ^ 

Floating-point " >Vr 

Terpsichore provides all the facilities mandate^d^anB' recommended by 
ANSI/IEEE standard 754-1985: Binary F%ting-po^^^irne||c, with the use of 
supporting software. 


Branch Conditionally 


The floating-point compare- 
types required and sugg< 
point comparisons augn 
special handling fof^^^T (| 
"unordered" with respect to any 




provide all the comparison 
ating-point standard. These fioating- 
of numeric ,'alue comparisons with 
KlaN compares as 
identical NaN. 


Terpsichore fi< 
exception on/ 
are desired, tffey 
compare and set 
instruction on f 
the set result.. N 


o not generate an 
:; if such exceptions 
r tse of a floating-point 
iint compare-and-branch 
or a ff^-point compare-and-branch on 

greater relations are anti-commutative, one of each relation 
ter only bylhe replacement of an L with a G in the code can 
'A- liy reversing trfe^drder oflhe operands and using the other code. 
UL ftiafion can be used in place of a UG relation by swapping the 
ids to the compare-and-branch or compare-and-set instruction. 

The E and NE relations can be used to determine the unordered condition of a 
single operand by comparing the operand with itself. 
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The following floating-point compare-and-branch relations are provided: 


Mnemonic 

Branch taken if values com 

pare as: 

Exception if 

code 

C-iike 

Unord- 
ered 

Greater 

Less 

Equal 

unord- 
ered 

invalid 

E 


F 

F 

F 

T 

no 

no 

NE 

i- 

T 

T 

T 

F 

no 

no 

UE 

?= 

T 

F 

F 

T 

no 

no 

NUE 

!?= 

F 

T 

T 

F 

^»o 

no 

NUGE 

!?>= 

F 

F 

T 

F 

no 

no 

UGE 

?>= 

T 

T 

F 


no 

no 

UL 

?< 

T 

F 

T 

' 

^* no 

no 

NUL 

!?< 

F 

T 

F 


no 

no 


Comnam-nnci-xat 

The floating-point compare 
supported as compare-i 
compare-and-set instru< 
comparisons involving. 



is provide [\ the comparison types 
erjF|ehore floating-point 
ge||gMt%an exception on 


Mnemoni^'% 

Result if values corneal 

m0~ «>:> 

4 as: 

^Exception if 

code 

4 



Less 


unord- 
ered 

invalid 

E 


*€\ 



\f 

no 

no 

NE 





F 

no 

no 

UE $ 


r 



T 

no 

no 

nut-:,.,;., 


F t 



F 

no 

no 




* 

■ ^ j 

F 

no 

no 



T . 

r t ^ 


T 

no 

no 



T 

F 

T 

F 

no 

no 


!?< 

^ F 

T 

F 

T 

no 

no 

kx 


F 

F 

F 

T 

no 

yes 

NE.X 

I = 

T 

T 

T 

F 

no 

yes 

UE.X 

?= 

T 

F 

F 

T 

no 

yes 

NUE.X 

!?= 

F 

T 

T 

F 

no 

yes 

L.X 

< 

F 

F 

T 

F 

yes 

yes 

NL.X 

!< 

T 

T 

F 

T 

yes 

yes 

NGE.X 

!>= 

T 

F 

T 

F 

yes 

yes 

GE.X 

<= j 

F 

T 

F 

T 

yes 

yes 
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Arithmetic Operations 

The operations supported in hardware are floating-point add, subtract, multiply, 
divide, and square root. Other operations required by the ANSI/IEEE floating- 
point standard are provided by software libraries. 

The operations explicitly specify the precision of the operation, and round the 
result to the specified precision at the conclusion of each operation. 


A single instruction provides a floating-point multiply with 
floating-point add. The result is computed as if the mu' " 
infinite precision, added as if in infinite precision, then • 
a particularly good match to the needs of vector linear s 

Rounding 

Rounding is specified within th< 
explicit state for a rounding moi' 

Exceptions 



:t fed into a 
r performed to 
This operation is 
routines. 


avoid maintaining 


All the mandated flo; 
occur; maintenance, 
routines. Because 
exception only, 
operations 
generating sp« 


a trap when they 
rmed using software 
be very frequent, this 
... Explicitly. Arithmetic 
be handled by default, 


itions that maintain the fullest 
operating on lower-precision 
pom^yecjpr va^#. These operations are useful for several 
Includin|pi#gital signal processing, image processing, and 
grapS&cs. The basic goal of these operations is to accelerate the 
>rmance of algorithms that exhibit the following characteristics: 

/ .ow- precision arithmetic 

The operands and intermediate results are fixed-point values represented in no 
greater than 64 bit precision. For floating-point arithmetic, operands and 
intermediate results are of 16, 32, or 64 bit precision. 

The use of fixed-point arithmetic permits various forms of operation reordering 
that are not permitted in floating-point arithmetic. Specifically, commutativity and 
associativity, and distribution identities can be used to reorder operations. 
Compilers can evaluate operations to determine what intermediate precision is 
required to get the specified arithmetic result. 
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Terpsichore supports several, levels of precision, as Well as operations to convert 
between these different levels. These precision levels are always powers of two, 
and are explicitly specified in the operation code. 

Sequential a.CQess fc> data. 

The algorithms are or can be expressed as operations on sequentially ordered 
items in memory. Scatter-gather memory access or sparse-matrix techniques are 
not required. 

Where an index variable is used with a multiplier, such mul&phers must be 
powers of two. When the index is of the form: nx+k, tb^ vahie of n must be a 
power of two, and the values referenced should have k include the majority of 
values in the range 0..n-l. A negative multiplier may a*' ¥ 

Vectorizable operations 




The operations performed on th< 
independent. Conditional opei 
or masking, or the compiler ' 

Data-handiina O p&mti<knK> 

The characteristii s of 
permit the use 
Octlet and 
number dej 

The discussion 
ordering of 
ordering used lor b>, 
loads and stores, the 
left t< right iud for little 


terns are identical and 
:o use boolean variables 
'd^Qto such a form. 


access to data, which 
reference the data, 
items of data, the 


byte ordering, though the 
must be consistent with the 
byte ordering is used for the 
that index values increase from 
the index values increase from 
idicate different index values" with 


hah an index of the nx+k form is used in array operands, where n is a power of 
2, data memory sequentially loaded contains elements useful for separate 
operands. The "deal" instruction divides a hexiet of data up into two octlets, with 
alternate bit fields of the source hexiet grouped together into the two results. For 
example, a G.DEAL.16 operation rearranges the source hexiet into two octlets as 
follows: 1 


1 Aa example of the use of a deal can be found in the appendix: Digital Signal Processing 
Applications: Decimation of Monochrome Image or Decimation of Color Image 
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In the deal operation, the source hexlet is specified by two outlet registers, and the 
two result octlets are specified as a hexlet register pan. * I'his rounds backwards, 
and it really is, but it works in practice, because the res.u]$r 15 usually used in 
operations that accept octlet operandi, IdSaUff'thWsouxce-jhdxlet should be a 
register pair, and the result should be two octlel u g >tcrs.J%# 

The example above directly applies ^to the^ase where n is 2 When n is larger, a 
series of DEAL operation, can be u- -d ty further sul divide%he sequential stream. 
For example, when n is 4, we need td de%Fout 4" sets of doublet operands, as shown 
in the figure below: 2 



.... i-way deal is performed by dealing out 2 sets of quadlet operands, and then 
dialing each of them out into 2 sets of doublet operands. 


2 An example of the use of a four-way deal can be found in the appendix: Digital Signal 
Processing Applications: Conversion of Color to Monochrome 
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There are three "rows, ol arrows shown above. The first row is the result of two 
G.DEAL.32 opj>rt|g^8* #e^iltlep^d%itly leahngl*- sets of pairs of doublets. 
The result of -these two^ppetaBon^s^C secondxjttw of boxes. The last row is the 
result of two' Independent G.DE 


each dealing 2 sets of doublets 
lows the implicit action performed 
hexlet sources of the G.DEAL.16 



w result of computation is accessed with an index of the form nx+k, 
I ri a power of 2, the reverse of the "deal" operation needs to be performed on 
vectors of results to interleave them for storage in sequential order. The "shuffle" 
operation interleaves the bit fields of two ocdets of results into a single hexlet. For 
example a G.SHUFFLE.16 operation combines two octlets of doublet fields into a 
hexlet as follows: 
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For larger values of n, a series of shuffle operations^c^ri^be used to combine 
additional sets of fields, similarly to the mechanism^u^^lfor^fhe deal operations. 
For example, when n is 4, we need to shuffle in 4 s< t? of doublet operands, as 
shown in the figure below: 3 

~ • ilifefell I TT1 



This 4-way shuffle is performed bv shuifting up. % sets of doublet operands, and 
then shulilm# t ich ol them ip as 2 set; >J quai llet operands. 


3 An example of the use of a fouf*Way shuffle ciO be 'found in the appendix: Digital Signal 
Pn^^^fpp^^pns: ConversiioW#'MonochrOTne to Color 
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TTTT T1 I I 1 1 



•w is the result of two 
2 sets of pairs of 
second row of boxes. The last 
132 operations, each shuffling 2 
row of arrows shows the implicit 
Ijacent registers for the two octlet 


£ of a source array operand or a destination array result is negated, 
t i other words, if of the form nx+k where n is negative, the elements of the 
array must be arranged in reverse order. The "swap" operation reverses the order 
of the bit fields in a hexlet. For example, a G.SWAP.16 operation reverses the 
doublets within a hexlet: 
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Variations of the deal and shuffle operations are also 
one precision to another. This may be required if 
different precision than another operanjj*£^the r< 
performed with intermediate precisioi 
when using an integer multiply. 

When converting from a 
halving the precision of a 
and the bit fields packed 
"deal" operation, in whiffet 
arbitrary half-sized^j^^^tl 
result. For example, a sel 
performed by the G COM J 


for converting from 
#is represented in a 
if computation must be 
t oi the operands, such as 


ion, specifically when 
:a must be discarded, 
is a variant of the 
;ult is an octlet. An 
ted to appear in the 
(uadlet in a hexlet is 



Compress 32 bits to 16, with 4-bit right shift 


When converting from lower-precision to higher-precision, specifically when 
doubling the precision of an octlet of bit fields, one of several techniques can be 
used, either multiply, expand, or shuffle. Each has certain useful properties. In 
the discussion below, m is the precision of the source operand. 

The multiply operation, described in detail below, automatically doubles the 
precision of the result, so multiplication by a constant vector will simultaneously 
double the precision of the operand and multiply by a constant that can be 
represented in m bits. 
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An operand can be doubled; in precision and shifted left with the "expand" 
operation, which is essentially the reverse of the" "compress" operation. For 
example the G.EXPAND.16,4 expands from 16 bits to 32, and shifts 4 bits left- 



The "shuffle" operation can double 
1 (unsigned only), 2 m or 2 
be a zeroed register and the 
to be the source operanc 
added to the source operj 
the shuffle. 




and multiply it by 
ie shuffle operation to 
erand and zero, or both 
instant can be freely 
'te right operand to 


tc operations most 
t r — tons. The fixed -point 
|ire most of the functions i rovided in the standard 
check conditions. These functions include add, 
^ set on^londition, and multiply, in forms 
fci%$y|fe as operands. The floating-point 
a -^^i^ as t ^ le sca ^ ar floating-point 
v:ked set of bit fields of the same size as 
it multiply function intrinsically doubles 

v^iHitional operations are provided only in the sense that the set on condition 
operations can be used to construct bit masks that can select between alternate 
vector expressions, using the bitwise boolean operations. All instructions operate 
over the entire ocdet or hexlet operands, and produce a hexlet result. The sizes of 
the bit fields supported are always powers of two. 


Galois Field Operations MU 0023245 

Terpsichore provides a general software solution to the most common operations 
required for Galois Field arithmetic. The instruction provided is a polynomial 
divide, with the polynomial specified as one register operand. The result of a 
specified number of division steps, expressed as a register pair, is the result of the 
instruction. This instruction can be used to perform CRC generation and 
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checking, Reed-Solomon code generation and checking, and spread-spectrum 
encoding and decoding. 

Register Usage 

All Terpsichore registers are identical and general-purpose; there is no dedicated 
zero-valued register, and no dedicated floating-point registers. By software 
convention, the non-specific general registers are used in more specific ways. 


register 
number 

usage 

I how sa^N 

0 

link 


1 

dp ia&]le% 

2-9 

parameters & 


10-31 

temporary 

jcall^ 

32-61 

saved < 


62 

fp, whettj%atfii$d \ - 


63 


Tdaflee 


At a procedure call 
procedure, which p^ 
save registers. Oj 
saved registers 



The dp registl^ 
statically-allocatei 
the dp registei 
procedure 
small offset 
Terpsicl 

additi^^s^iste^^d/or 


tfimjiji^the caller or callee 
;e%il%s to avoid needing to 
rj&bles into caller or callee 
f> w^^rocedure calls. 

ll^ointers, literals, and 
t#procedure. The uses of 
register, except that each 
iands the space addressable by 
distinction, as the offset field of 
[y 12 bits. The compiler may use 
address larger regions. 


„ Iso permits code to be shared, with each static instance of the 
„ fion assigned to a different address in memory. In conjunction with position- 
dependent or pc -relative branches, this allows library code to be dynamically 
relocated and shared between processes. 

MU 0023246 

Procedure Calling Conventions 

Procedure parameters are normally allocated in registers, starting from register 2 
up to register 9. These registers hold up to 8 parameters, which may each be of 
any size from one byte to eight bytes, including single-precision and double- 
precision floating-point parameters. Quad-precision floating-point parameters 
require an aligned pair of registers. The C varargs.h or stdarg.h conventions may 
require saving registers into memory (this is not necessarily so, but some semi- 
portable semi-conventions such as _doprnt would break otherwise). Procedure 
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return values are also allocated in registers, starting from register 2 up to register 

There are several data structures maintained in registers for the procedure calling 
conventions: link, sp, dp, fp. The link register contains the address to which the 
callee should return to at the conclusion of the procedure. 

The sp register is used to form addresses to save parameter and other registers, 
maintain local variables, i.e., data that is allocated as a LIFO stack. For procedures 
that require a stack, normally a single allocation is perform^f%s?Mch allocates 
space for input parameters, local variables, saved regfehrj and output 
parameters all at once. The sp register is always 16-byte a 



The fp register is u 
during execution M 
When the stack 
address the 
purposes as 



variables for the 
P'-mif of the procedure, 
ip ^register required for the 


The dp register is used to address pointers, literals*!, 
procedure. The newpc register is loadedJ|ith thejfenl 
and the newdp register is loaded with 

procedure. This mechanism providesibrWynaink lifinfe^liitially" filling in the 
link and dp fields in the data struc^rt%to^.#Cm th%J^alnic linker. The linker 
can use the current contents <^tp-link;anc- '01 dp Agisters to determine the 
identity of the caller and caltee^cf fint^^gea^a#to M in the pointers and 
resume execution. 


^ the stack size varies 
le GNU alloca function. 
?, th%sp register is used to 
for other general 


Typical dynamic-link ed. inter-mndule calling sequence: 
caller or callee (non-leaf): 


A.ADDI 

S.64 

S.64 

... (using dp) 
L.64 


sp,-size 

link,off(sp) 

dp,off(sp) 
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L64 

B.UNK 

L.64 

... (using dp) 
L.64 
A.ADDI 
B 


dp,off(dp) 
link, link 
dp,off(sp) 

!ink,off(sp) 

sp.size 

link 


callee (leaf): 


... (using dp) 4> 

B ,inl< 
The load instruction is required in the caller following deprocedure call to 

restore the dp register. A second load instruction also |^o%# the link register, 

which may be located at any point between the last pro%dii|e call and the branch 

instruction which returns from the procedure. 

System and Privileged! 


It is an objective to make i 
as possible to normal proce 
system calls as an exceptic 
we prefer to use a modi" 
quietly raised to the 
interaction with the^r^^llmeinB 

Such a routine 
point, oth< 
advantage of. 
invoked at a pr< 
ensure this, 
directly froi 
data poii 
A gat« 
com 



:ged libraries as similar 
„^bovj| Rather than invoke 
h Ifiyoivt.b significant latency and complication, 
cf«re ]call m which the f roccss privilege level is 
r ^mjhptovvl; this mechanism safely, 
;y|^o^is%equ^^ 

^ its legitimate entry 
;vel might be taken 
that the procedure is 
dftl%|jointer is properly set. To 
ay induction retrieves a "gateway" 
spaf^The gateway is accessed via the 
te entry point of the procedure, 
irtual address space designated to 
inSt be forged. 
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RF[0] 
RFEU 



CODE 


>de space 


Similarly, a return 
privilege. This 
procedure ma; 
certain, thouj ' 
code call less- 
a privileged 
In such a 


wolves a reduction of 
>yi«^rri|0ctural facilities, so a 
'ode address. However, in 
i|L|Sg^ve highly privileged 
>le, a u^elAay request that errors in 
; iglt?useFlm^pued error-logging routine, 
fedure Actually' requires an increase in 
n s ^^ mj a branch-through-gateway 
>n following the call, to raise the 
a case, special care must be taken 
not permitted to gain unauthorized 
registers, such as by saving all registers 
stack frame that may be manipulated by the less-privileged 


Typical dynamic-linked. inter-astewR v calling sggygnse; 
caller: 


S.64 
S.64 

L.64 
B.GATE 
L.64 
L.64 


link.off(sp) 
dp,off(sp) 

dp.off(dp) 
Iink,off(dp) 
link,off(sp) 
dp.off(sp) 
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callee (non-leaf): 


S.64 

sp,off(dp) 

L.64 

sp.off(dp) 

S.64 

link,off(sp) 

S.64 

dp,off(sp) 

... (using dp) 


L64 

link,off(sp) 

L.64 

dp,off(sp) 

L.64 

sp,off(dp) 

B.DOWN 

link 

callee (leaf): 



... (using dp) 
B.DOWN 


link 



The callee, if it uses a stack for local variable allocation ;^cannot necessarily trust 
the value of the sp passed to it, except, as a regioii,to receiV^jparameters held in 
memory. 

Pipeline Oraanizati 


Terpsichore performs 
precise exceptions 
subsequent discussioj 
correctly. Howeveg^tr|§§ 
achieved only by matchinj 
pipeline. In the following di: 
implementafi^^^^ 
implementation 

Super-St rii 



, ioim.incc o: 
rdering ofansl 

on, the ^cnL-r. 
discussion of 



_ one, in-order, with 
tly, t'ucit which ignores the 
lemenUitions will still perform 
the Terpsichore processor is 
characteristics of the 
, racteristfes of all Terpsichore 
icifp^^oices for specific 



Terpsich< 
cycle. J 

may be .issued in a single 
other words, a regi t< r r >-r< . 
ro-register data" calculation 


iveral instructions in each clock 
types, one instruction of each type 
cycle. Tin- ordering required is A, L, E, S, B; in 
address calculation, a memory load, a register- 
memory store, and a branch. Because of the 
organization of the pipeline, each of these instructions may be serially dependent. 
Instructions of type E include the fixed-point execute-phase instructions as well as 
floating-point and digital signal processing instructions. We call this form of 
pipeline organization "super-string," 4 because of the ability to issue a string of 
dependent instructions in a single clock cycle, as distinguished from super-scalar 
or super-pipelined organizations, which can only issue sets of independent 
instructions. 

These instructions take from two to five cycles of latency to execute, and a branch 
prediction mechanism is used to keep the pipeline filled. The diagram below 
shows a box for the interval between issue of each instruction and the completion. 


4 Readers with a background in theoretical physics may have seen this term in an other, 
unrelated, context. 
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Bold letters mark the critical latency paths of the instructions, that is, the periods 
between the required availability of the source registers and the earliest 
availability of the result registers. The A-L critical latency path is a special case, in 
which the result of the A instruction may be used as the base register of the L 
instruction without penalty. E instructions may require additional cycles of latency 
for certain operations, such as fixed-point multiply and divide, floating-point and 
digital signal processing operations. 



lent tByihe organization defined above, 
:line^^^rvice load operations may be 
feline, 


Terpsichore p|^ 
in which tWtif% 

flexibly exteffded Thus/ the ft<>nt of the feline, in which A, L and B type 
mstruc^ionjafe h/ndled', i s .decoupled from the back of the pipeline, in which E, 
an ".^%J|i^^^%i#s arel^nd^d. decoupling occurs at the point at which 
eac%a]|d its backffi|^iemory%re referenced; similarly, a FIFO that is 
. the M^ruction fetch unit decouples instruction cache references from 
||| 'front of the pipeline shown above. The depth of the FIFO structures is 
implementation-dependent, i.e. not fixed by the architecture. 

The diagram Jjelow indicates why we call this pipeline organization feature 
"super-spring," an extension of our super-string organization. 



_^S.UQej^s pring pipeline 
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With the super-spring organization, the latency of load instructions can be 
extended, so execute instructions are deferred until the results of the load are 
available. Nevertheless, the execution unit still processes instructions in normal 
order, and provides precise exceptions. 



Branch/fetc hffie&iptfaj 

Terpsichorc b s nol have delayed branch instruction arid so r ties upon branch 
or fetch prediA»rr|^^^the pipeline hilt around unconditional and conditional 
branch instrucuofis. The hardwai prediction mechanism is tuned for optimizing 
conditional branches that. cMse loopfe-Jr expreSs^ frequent alternatives, and will 
generally require substantially niOf^tC5%lcs wl^Cexecuting conditional branches 
whose outcome is not predominately t|ken ^f^'f-taken. For such cases, the use of 
code which' avoids condidonaftrariches-^favor of the use of set on compare and 
mulj^/|fist|y^gnl may %sulj§in g#l||eY*performance. 



and Execute Resources 


MU 0023252 


Studies of the dynamic distribution of Terpsichore instructions on the various 
benchmark suites indicate that the most frequendy-issued instruction classes are 
load instructions and execute instructions. In a high-performance Terpsichore 
implementation, it is advantageous to consider execution pipelines in which the 
ability to target the machine resources toward issuing load and execute 
instructions is increased. 


One of the means to increase the ability to issue execute-class instructions is to 
provide the means to issue two execute instructions in a single-issue string. The 
execution unit actually requires several distinct resources, so by partitioning these 
resources, the issue capability can be increased without increasing the number of 
functional units, other than the increased register file read and write ports. The 
partitioning favored for the initial implementation places all instructions that 
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involve shifting, including dealing, and shuffling in, one execution unit, and all 
instructions that involve multiplication, including fixed-point and floating-point 
multiply and add in another unit. Resources used for implementing add, subtract, 
and bitwise logical operations may be duplicated, being modest in size compared 
to the shift and multiply units, or shared between the two units, as the operations 
have low-enough latency that two operations might be pipelined within a single 
issue cycle. These instructions must generally be independent, except perhaps 
that two simple add, subtract, or bitwise logical may be performed dependency, if 
the resources for executing simple instructions are shared betw^n the execution 
units. 

One of the means to increase the ability to issue loa< 
provide the means to issue two load instructions 
would generally increase the resources required 
data cache, but a compensating soluti< 
instruction to execute the second loaj' 
then contain either two load insti 
instruction, which uses the Si 
resources as the basic 5-instruci 

Result Forwarding 

When temporally adjl 
results of the first ins»u< 
used to execute s ^ 
may have be< 
resources. A 
resources so th' 
immediately fo^ 
execution in; 
from the 
incurre< 


v istructions is to 
de-issue string. This 
?fetch unit and the 
ss for the store 
-issue string can 

_i and one store 

tddress computation 


:perate resources, the 
directly to the. resource 
It places a value which 
tths use significant 
provide forwarding 
within a string are 
;ept Efetween.a first and second 
' ion, .when . forwarding results 
unit, additional delay may be 
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Instruction Set 

All instructions are 32 bits in size, and use the high order 8 bits to specify a major 
operation code. 

31 24 23 0 

j major I other | 



For the major operation field values A.MINOR, L.MINOR, E.MINOR, F.16, F.32, 
F.64, F.128, GF.16, GF.32, GF.64, G.l, G.2, G.4, G.8, G.16, G.32, G.64, S.MINOR 
and B.MINOR, the lowest-order six bits in the instruction specify a minor 
operation code: 

31 24 23 65 0 

| major I other | minor | 


5 Blank table entries cause the Reserved Instruction exception to occur. 
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The minor field is filled with a value from one of the following tables: 



0 

8 


' 24 

32 

40 

48 

56 ' 

0 



AAND 









AOR 






2 



AXOR 






3 



AANDN 







AADD 

ASUB 

ANAND 





/ ASH LI 

5 



ANOR 






e 



AXNOR 





ASHRI 

7 



AORN 





\ AUSHRI 



GF.stee 
0 






40 


S6 '" 



; «£FSUK§ 


4'^ UPC 

, -QFSUB 

GFADD.X 
GFSUB.X 

GFSETE 
GFSETNE 

GFSETE.X 
GFSETNE.X 

... 2 r 





GFMUL 

GFMULJC 

GFsenuE 

GFSETUE.X 



GF.UNARY.T 


GF.UNA^Y.C 

GFDIV 
GF.UNARY 

GFDIV.X 

GFSETNUE 
GFSE7NUGE 
GFSETUGE 

GFSFTNUEJC 
GFSETLX 
GFSETNL.X 






... 

/ 
/,', 


GFSETUL 
GFSETNUL 

GFSETNGE.X 
GFSETGE.X 


minor operation code field valpe£> 

for GF.s 

ze 


0 

0 

GSETE 

8 

16 
GAND 



40 

48 

56 


GSETNE 


GOR 


G6WAP- 

<fZ£>TL 

GMUL 
GUMUL 

SCOMPBESSI 

2 

GSETL 


GXOR 




GDIV 

GEXPANDI 

3 

GSETGE 
GADD 

GSUB 

GANDN 
GNAND 


GGiHorrtE 


GUDIV 

GUEXPANDI 

5 



GNOR 

GSHL 



&&&& 

V/'GSHCl 

6 

GSETUL 


GXNOR 

V GBFIfi 

GjOATHEB- 



i/ GSHffl 

7 

GSETUGE 


GORN 

/ GJJ8F1R 






minor operation code field values for G.size 
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0 

LU16LA 

L16LA 

L64LA 

C5 






LU16BA 

L16BA 

L64BA 

LU8 





2 

LU16L 

L16L 

L64L 







LU16B 

L16B 

L64B 







LU32LA 

L32LA 

L128U 






5 

LU32BA 

L32BA 

L128BA 






6 

LU32L 









LU32B 

L32B 

L128B 







minor operation code field values for L.MINOR 


-sBnrasr 

0 



24 

32 



— 5e — 

0 

SAAS64LA 

S16LA 


§8 






SAAS64BA 

S16BA 

S64BA 



— 



2 

SCAS64LA 

S16L 

S64L 






3 

SCAS64BA 

S16B 

S64B 







SMAS64LA 

S32LA 

S128LA 






5 

SMAS64BA 

S32BA 

S128BA 






6 

SMUX64LA 

S32L 

S128L 






7 

SMUX648A 

S32B 

S128B 








64, F.128, with minor operation 
F, F. UNARY. C, F.UNARY, 
lues GF.16, GF.32, GF.64, with 
GF.UNARY.T, GF.UNARY.F, 
ARY.X, another six bits in the 


The unary field is filled with a value from one of the following tables: 
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unary operation code field values for GF.UNARY.size.r 


r 


offset 


le are one of 
0 


8 

24 ' , ' ■ "' 


31 24 23 18. 4 


0 

1 major | ra ^ 


_ I 

8 a "V^ 




31 24 23 18 17 ' ' 12 11 

I major fry M~TT 7Trrr~~"^grt" 
T— T— ."rH — 


The general for^VM^eJhWu?tionf cj^ed by laiajor^and minor operation codes 
are on< oi I he following: 

31 »/I ^24 23" l 8 fe^ ,^11 - fi fi 

rc | minor" 


6 


I4 


: V ' 24 P 18 . 17 12 1 1 6 5 

I ma ' or 1 ra I si mm 1 rfe | minor 1 

~ e e ^ y — 5 

The general form of the instructions coded by major, minor, and unary operation 
codes is the following: r 


r 


I ra | unary | jg | m j nor 
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simm <- rb f- inst-|7..i2 
rc <- instn .e 
minor <- rd <r- insts..o 
case major of 
A. MINOR: 

case minor of 
A.RES 

AlwaysReserved 
A.ADD, A AND, A.ANDN, A.NAND, A.NOR, 
A.OR, A.ORN, A.SUB, A.XNOR, A.XOR: 

Address(minor,ra,rb,rc) 
A.SHL.I, A.SHR.I, A.USHR.I: 

AddressShortlmmediate{m inor, ra, sim m , 
others: 

raise Reserved Instruction 

end case 
A.ADD. I, A.AND.I, A.OR.I, A.NAI 
Addresslmmediate(major,r; 
A.COPY. 

AddressCopylmmediajei 
E. MINOR: 

case minor of 

E.ADD, Z, ADD SO, F AND, E.ANDN, 
E.OR, E.ORN, E S jr ~ • ' 
E.SHI 

UMUl 



XPAND, 
!UM, 

T.UL, E.SET.UGE, 
f.SUB.UL, E.SUB.UGE: 


?}, E.NOR.I, E.XOR.I, 
NE, E.&T I.L ES.ET1 G£, E SET.I.UL, E.SET.I.UGE, 
J.I^^^SlRL^BnXGE. E.SUB.I.UL, E.SUB.I.UGE: 
**i..o) 


Confidential 


E ADD !, £ ADD. 
E SET IE. E.SET 
E.SUB I E. E.SI H 

Executelmmcdlate(majoi,ra 

ExeculeTernary(major,ra,rb,rc,rd) 

E. COP?J 

ExecuteCopylmmediate(major,ra,insti7..o) 
FMULADD16, FMULADD32, FMULADD64, FMULADD128, 
FMULSUB16, FMULSUB32, FMULSUB64, FMULSUB128: 

FloatingPointTernary(major,ra,rb,rc,rd) 

F. 16, F.32, F.64, F.128: 

case minor of 

F.ADD.N, F.SUB.N, F.MUL.N, F.DIV.N, 
F.ADD.T, F.SUB.T, F.MUL.T, F.DIV.T, 
F.ADD.F, F.SUB.F, F.MUL.F, F.DIV.F, 
F.ADD.C, F.SUB.C, F.MUL.C, F.DIV.C, 
F.ADD, F.SUB, F.MUL, F.DIV, 
F.ADD.X, F.SUB.X, F.MULX, F.DIV.X, 
F.SET.E, F.SET.NE, F.SET.UE, F.SET.NUE, 
F.SET.NUGE, F.SET.UGE, F.SET.UL, F.SET.NUL, 
F.SET.E.X, F.SET.NE.X, F.SET.UE.X, F.SET.NUE.X, 
F.SET.LX, F.SET.NL.X, F.SET.NGE.X, F.SET.GE.X: 
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FloatingPoint(minor.op, major.size, minor.round, ra, rb, rc) 
F.UNARY.N, F.UNARY.T, F.UNARY.F, F.UNARY.C, 
F. UNARY, F..UNARY.X: % 
case unary of 

F.ABS, F.NEG. F.SQR, 

F.HALF, F.SINGLE, F, DOUBLE, F.QUAD, • 

F.1NT, F. FLOAT: 

FloatingPointUnary(unary.op, major.size, minor.round, 
ra, rc) 

others: 

raise Reservedlnstruction 

endcase 
others: 

raise Reservedlnstruction 

endcase 

GMULADD1, GMULADD2, GMULADD4, 
GMULADD8, GMULADD16, GMULA$D32, 
GUMULADD2, GUMULADD4. 
GUMULADD8, GUMULADD16^ 
GMUX, GMUXGATHER, 

GroupTernary(major,s. 
G. EXTRACT. I, G.EXTRA^f 
GroupExtractlmmedia 
G.1, G.2, G.4, G.Q.f 


case minor 



Highly Confidential 


G.UMUL, 
»R, G.XNOR, G.ORN, 
5ET.UL, G.SET.UGE, 
PRESS, G. EXPAND, 


)D64, 

$32, G1MULSUB64: 

ipF!oatingPointTernary(majorja,rb,rc,rd) 
IF.32, GF.64, GF.128: 
case minor of 

GF.ADD.N, GF.SUB.N, GF.MUL.N, GF.DIV.N, 
GF.ADD.T, GF.SUB.T, GF.MUL.T, GF.DIV.T, 
GF.ADD.F, GF.SUB.F, GF.MUL.F, GF.DIV F 
GF.ADD.C, GF.SUB.C. GF.MUL.C, GF.DIV.C, 
GF.ADD, GF.SUB, GF.MUL, GF.DIV 
GF.ADD.X, GF.SUB.X, GF.MULX, GF.DIV.X, 
GF.SET.E, GF.SET.NE, GF.SET.UE, GF.SET.NUE, 
GF.SET.NUGE, GF.SET.UGE, GF.SET.UL, GF.SET.NUL 
GF.SET.E.X, GF.SET.NE.X, GF.SET.UE.X, GF.SET.NUE.X, 
GF.SET.L.X, GF.SET.NL.X, GF.SET.NGE.X, GF.SET.GE.X: 

GroupFloatingPoint(minor.op, major.size, minor.round, ra, rb, rc) 
GF.UNARY.N, G F.UNARY.T, GF.UNARY.F, GF.UNARY.C 
_^ GF. UNARY, GF.UNARY.X: 

case unary of — 

GF.ABS, GF.NEG, GF.SQR, MU 0023259 
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GF.HALF, GF.SINGLE, GF.DOUBLE, GF.QUAD, 
GF.1NT, GF. FLOAT: 

GroupFloatingPointUnary(unary.op, major.size, 
minor, round, ra, rc) 

others: 

raise Reservedlnstruction 

endcase 
others: 

raise Reservedlnstruction 

endcase 
L. MINOR 

case minor of 

L16L, LU16L, L32L, LU32L, L64L, L128L, L8^: 
L16LA, LU16LA, L32LA, LU32LA, L64LA, 
L16B, LU16B, L32B, LU32B, L64B, L128f 
L16BA, LU16BA, L32BA, LU32BA, L6/ 

Load(minor,ra,rb,rc) 
others: 

raise Reserved.^ 

endcase 
L16LI, LU16LI, L32LI, LU3; 
L16LAI, LU16LAI, L32LA1, tUCLAi, 
L16B1, LU16BI, L32B! Kt LL^2BI, I 
L16BAI, LU16BAI, 
Loadlmmedjat 
S.MINOR 



SM64LAI, 

&2BI, S64BI, S128BI, 
S16BAI; S32BAI, S64BA1, S128BAI 
SAAS64BAI, SCAS64BAI, SMAS64BAI, SM64BAI: 

Storelmmediate(major,ra,rb,instii..o) 
B. MINOR: 

case minor of 

B. B.LINK, B.DOWN: 

Branch{minor,ra,rb) 
others: 

raise Reservedlnstruction 

endcase 
BLINKI, Bl: 

Branchlmmediate(major,inst23,.o) 
BFE16, BFNE16, BFUE16, BFNUE16, 
BFNUGE16, BFUGE16, BFUL16, BFNUL16, 
BFE32. BFNE32, BFUE32, BFNUE32, 
BFNUGE32, BFUGE32, BFUL32, BFNUL32, 
BFE64, BFNE64, BFUE64, BFNUE64, 
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BFNUGE64, BFUGE64, BFUL64, BFNUL64, 
BFE128, BFNE128, BFUE128, BFNUE128, S 
BFNUGE128, BFUGE128, BFUL128, BFNUL128, 
BE, BNE, BL, BGE, BUL, BUGE, 

BANDE, BANDNE, BANDL, BANDGE, BANDG, BANDLE: 

BranchConditional(major,inst| 1 0 ) 
BGATE: 

BranchGateway(ra,rb,inst-(i..o) 
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Always Reserved 

This operation generates a reserved instruction exception. 
jA.RES 1 Always reserved 

Format #J!V^ 
A, RES imm 


31 24 23 . ^ > 

I A.RES I I imm%^ >. 


The reserved instruction exceptor! is^a^d. J|g^$^re may depend upon this 
major operation code raisi^^^fe«^l^st^cr^D?exd^8pn in ail Terpsichore 
processors. The choice ol operation code intentionally ensures that a branch to a 
zeroed memory area will raise m v j cccption 

Definition V\^' \ 

def AlwaysRes^sdiaS 4> ^ * * * 

raise Re < \n iln rmcttort 

enddef #^C^ .^^^ y 

Reserved Instruction 
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Address 


These operations perform calculations with two general register values, placing 
the result in a general register. 

Operation codes 


A.ADD 


Address add 


A, AND 


Address and 


A.ANDN 


Address and not 


A.NAND 


A.NOR 


Address not and 


7VOFT 


Address not or 


Address or 


Address or now 


Address exM0siv#no?*% 


AXOR 


Address 


op 


rc=ra,rb 


4_ 

8 11 


24..„W% 18 W7 
A.MINORf || » ga x 

8 : . 6 . . : 


I °p 1 


D escription * ^ 

The content of register? ra and rb are ietche<i ancPthe specified operation is 
performed on these operands The result is placed into register rc. 


DefinipS^ #*\^ ^V*^>S5> 


A.AND: 

c <- a and b 
A.OR: 

c <- a or b 
A.XOR: 

c <- a xor b: 
A.ANDN: 

c <r- a and not b 
A.NAND: 

c <- not (a and b) 
A.NOR: 

c «- not (a or b) , , — ^ 

A.XNOR: High/y Confix.- . 

c<-not(axorb) ' y UOnf "tent»ai 
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A.ORN: 

c <- a or not b 

endcase 
REG[rc] <- c 
enddef 

none 
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Address Copy immediate • •. . 

This operation produces one immediate value, placing the result in a general- 
register. 

Operation codes 



MU 0023265 


Highly Confidential 


For- evaluation only 


-53- 


microunity- confidential 


Terpsichore System Architecture 


REDACTED 


Address Immediate 


These operations perform calculations with one general register value and one 
immediate value, placing the result in a general register. 

Operation code? 


A.ADD.I 

Address add immediate 

A.AND.! 

Address and immediate ^ 

A.NAND.I 

Address nand immediate , i : '.-V^" 

A.NOR.I 

Address nor immediate 

A.OR.I 

Address or immediate 

A.SUB.I 

Address subtract immediate 

A.XOR.I 

Address xor iajrjtediate. 


Format 


rb=ra,imm 


ra ■ 


'liriii 


Qescription 

The contents $of regi u r ra is etchoff, a&d a^flbit 
extended from the 12 bit ir m held. The specified ope: 
operands. The result Is pbc d into r< gi^eirtt%. 

Definition " VWr iV* 

A.ANDJ 

b <r- a and i 
A.OR.I: 

b <- a or i 
A.NAND.I: 

b <- a nand i 
A.NOR.I: 

b <- a nor i 
A.XOR.I: 

b «- a xor i: 
A.ADD.I: 

b<~a + i 

endcase 

enddef 61 ^ 3 *~ b Hlgh,y Confjdential 

Exceptions 


ite value is sign- 
is performed on these 
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Address Immediate Reversed 

These operations perform calculations with one general register value and one 
immediate value, placing the result in a general register. 

Operation codes 

\ A.SUB. 1 | Address subtract immediate 1 


immediate value is sign- 
Jeratiifi^ performed on these 



mat 
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Address Reversed 

These operations perform calculations with two general register values, placing 
the result in a general register. 

Operation codes 

IA.SUB | Address subtract I 

Format 

op rc=rb,ra 

31 24 23 18 17 4 12 lK^'^ 6^5 0 

I A. Mi NOR I ra \ ^ .frjb^yj, op \ 

8 6 sjfV 6 S, V r ^ * 6 

Descriptio n /^f>/ 

The contents of r^isters^^t#r^«4^^etc^*|iScl d#fpecified operation is 
performed on these oper-af^ls. The result 1 placed into renter rc. 

Definition ^ ; 
def Adcire-^cRe 'trsaaf^.r.i rbTe^ is 
a «- REG[sa] 

b<-REG[rb| % ^ 

case op o{ ' ^ * ^ 

A.SUB: 4f/*\ ^% 

u^b ;- w '* ' If r 
endcase i ^ ^ %/\ >«t,» ' - kV 

REG[icH-c 

enddef ^> <C\> 


none 
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Address Short Immediate 

These operations perform calculations with one general register value and one 
immediate value, placing the result in a general register. 

Operation mctei 


A.SHU 

Address shift left immediate 

A.SHR.I 

Address signed shift right immediate^ 

A.USHRI 

Address unsigned shift right immeij$ I 


te value is taken from 
on these operands. The 



send case 
REG[rc] <- c 
enddef 

Exceptions 
none 
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Branch 


This operation branches to a location specified by a register, optionally reducing 
the current privilege level. 

Operation co0e$ 


B 

Branch 

B.DOWN 

Branch down in privilege 



placed into register 
^ tents of register ra. If 
vel specified by the low 


a is not aligned on a 
the low-order two bits 


endif 


if (REG[ra] and 3) * 0 then 

raise AccessDisallowedByVirtualAddress 

endif 

endif 

ProgramCounter <- REG(ra]63. 2 " 0 2 
enddef 

Exceptions 

Access disallowed by virtual address 
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Branch and I ink 

This operation branches to a location specified by a register, saving the value of 
the program counter into a register. 

Operation codes 



ra is not aligned on a 
of the low-order two bits 


REGfrb] «- ProgramCounter + 4 
ProgramCounter 4- REG[ra] 6 3..2 " 0 2 
enddef 

Exceptions 

Access disallowed by virtual address 
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Branch Conditionally 

These operations compare two operands, and depending on the result of that 
comparison, conditionally branches to a nearby code location. 

Operation cocfeg 


B.AND.E 


BT.UE.16 


B.F.UE.32 


Branch and equal to zero 



Branch floating-point unordered or equal half 


Branch floating-point unordered or equal single 


B.F.UE.64 


Branch floating-point unordered or equal double 
Branch floating-point unordered or equal quad 


B.F.UE.128 


B.F.UGE.16 


Branch floating-point unordered greater or equal half 


Branch floating-point unordered greater or equal single 


B.F.UGE.32 


Branch floating-point unordered greater or equal double 


B.F.UGE.64 


Branch floating-point unordered greater or equal quad 


B.F.UGE.128 


B.F.UL.16 


Branch floating-point unordered or less half 


B.F.UL.32 


Branch floating-point unordered or less single 
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B.F.UL64 

Branch floating-point unorderGd or less double 

B.F.UL128 

Branch floating-point unordered or less quad 

B.GE 

Branch signed greater or equal "1 

B.L 

Branch signed less 

B.NE' 

Branch not equal 

B.U.GE 

Branch unsigned greater or equal 

B.U.L 

Branch unsigned less j 


number format 

type 

compare 

size 

signed integer 


E Nb L G£ 


unsigned integer 

U 

b« NE9 L ^ 


bitwise and 

AND 

E NE L ^\ %GE 


floating-point 

F 

4 


16 
32 
64 
128 



, „ specified by the op field. If 
ranches to the address specified 
iontinues at the next sequential 

:e speclpd by the op field is 128, the even-odd pairs of registers specified 
and rb are compared. In such a case, rao and rbo must be zero for the 
iction to be valid. 

Definition 

def BranchConditiona!(op,ra,rb,offset) as 
case op of 

BFE16, BFNE16, BFUE16, BFNUE16, 
BFNUGE16, BFUGE16, BFUL16, BFNUL16 
BFE32, BFNE32, BFUE32, BFNUE32, 
BFNUGE32, BFUGE32, BFUL32, BFNUL32 
BFE64, BFNE64, BFUE64, BFNUE64, 

7 B.NE suffices for both signed and unsigned comparison for inequality. 

8 B.U.E implemented as B.E. / 

9 B.U.NE implemented as B.NE. ■ • ' 
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BFNUGE64, BFUGE64, BFUL64, BFNUL64, 
BFE128, BFNE128, BFUE128, BFNUE1 28, 
BFNUGE128, BFUGE128, BFUL128, BFNUL128: 

type<-F 
BE, BNE, BU BGE: 

type <- NONE 
BUL, BUGE: 

type <r- U 

BANDE, BANDNE, BANDL, BANDGE, BANDG, BANDLE: 
type <r- AND 

endcase 
case op of 
B.AND.G: 

compare <- G 
B.AND.LE: 

compare <- LE 
B.AND.E, B.E, B.F.E.16, B.F.E.32, B.F| 

compare <- E 
B.AND.GE, B.GE, B.U.GE: 

compare <~ GE 
B.AND.L, B.L, B.U.L: 

compare <~ L 
B.AND.NE, B.NE, B.F.NE.16 B F N 
compare < NE 



B.F.UL.16. 

/^mWe^ 
endc^sab^J 
case op of 

BFE16 B c fC - 

•aiz.8 <- 1 6 

P." ^ BFE32, BFNE32, BFUE32,. BFNUE32, 

BFNUGE32, BFUGE32, BFUL32, BFNUL32: 

size <- 32 
BFE64, BFNE64, BFUE64, BFNUE64, 
BFNUGE64, BFUGE64, BFUL64, BFNUL64: 

size <- 64 

BFE128, BFNE128, BFUE128, BFNUE128, 
BFNUGE128, BFUGE128, BFUL128, BFNUL128: 

size <- 128 
BE, BNE, BL, BGE, BUL, BUGE:, 

BANDE, BANDNE, BANDL, BANDGE, BANDG, BANDLE: 
size <- undefined 

endcase 
case type of 
NONE: 

1 4r- REG[rb] m 
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I <— 0 Jl REG[rb] 
r<~0)i REG[ra] 

y. 

I <- REG[ra] and REG[rb] 
r-f- 0 

case size of 
16 

r<- F16(REG[ra]) 
I <- F1 6(REG[rbj) 

32: 

r *- F32(REG[ra]} 


I <- F32(REG[rb] 


r<- F64(REG[ra]) 
I <- F64(REG[rb]) 



endif 
if c then 

PC *- PC + (offset-, j 50 II offset II 0 2 ) 

endif 
enddef 

Exceptigns , 

Reserved Instruction 
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Branch Gateway Immediate 

This operation provides a secure means to call a procedure, including those at a 
higher privilege level. 

O pera tion QQde$ 


IB.GATE.I 


1 Branch gateway immediate 


B.GATE.I rajmm 

31 24 23 


B.GATE.I 



Description 

A virtual address is computed,^fr%i "the? ; .... 
sign-extended value of the of feet field. The c 
the big-endian byte or4f|/ ; %J|tcb^d^^brarfi 
the memory data, and|y|jl%succ% 
with the current exfcu 


:ontents of register 1 and the 
of W%tes of memory using 
"1|^€u1|5 to the contents of 
counter, catenated 


:ddress is a higher 
set for the gateway 
>oundary. 

^ is not equal to one, or the rb 
l$|l^3..0 and bits 11.. 6 are not equal 


ripf hr>.n* iiG.-j:ewaylmmediate(ra rb.imm) as 

• rtAcfdr = REG[ra] + (immn 50 il imm) 
• , if VirtAddr3..o ^ 0 then 

raise AccessDisallowedByVirtualAddress 

endif 

if (ra * 1 ) or (rb * 0) or (immi 1 .6 * 0) or (imm3..o ?0) then 
raise Reservedlnstruction 

endif 

b«- LoadMemory(VirtAddr,128,B) 
bx <- 0 64 II ProgramCounter 6 3..2+1 II PrivilegeLevel 
ProgramCounter <- be3. 2 II 0 2 
PrivilegeLevel <- LoadProtection(VirtAddr,128,B) 
REG[rb] <- bx 
enddef 
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Exception* 

Reserved Instruction 
Access disallowed by virtual address 
Access disallowed by tag 
Access disallowed by global TLB 
Access disallowed by local TLB 
Access detail required by tag 
Access detail required by local TLB 
Access detail required by global TLB 
Cache coherence intervention required by tag 
Cache coherence intervention required by local TLB 
Cache coherence intervention required by global TLB 
Local TLB miss > 
Global TLB miss 4 


1 _ , 


% 
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Branch Immediate 

This operation branches to a location that is specified as an offset from the 
program counter, optionally saving the value of the program counter into register 
0. 

Operation codes 


B.I 

Branch immediate ■* i 

B.LINK.I 

Branch immediate and link > Y f^ v 


Format C % 

°P tar 9 et jfik 


31 24 23 


Descrip tion • ~ 

If requested, the addled of tHe instruction following this one is placed into 


register 0. Execution branches to the address specifkd by the offset field. 

def Bran ch I m mediate ( op oikM] as 
if (op = B LINM, tftsn 

REQ[0UW/4 ^ ° „ " s >> 

endif #C w * f 1 ^felT ^ 
PC <_ (0#set 2a 3 $ II 0«cet i! 0^} 

enddef ^T^*^^ # * '4lS> 

E*CQ0A r% || 

no^ ^ ^| ^# ^ 
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Execute 

These operations perform calculations with two general register values, placing 
the result in a general register. 

Operation pontes 



Execute add 

t.AUU.oU 

Execute add and check signed overfly > 

E.AND 

Execute and 

ET AKIHKI 
t.ANUlN 

Execute and not 

E.ALMS 

Execute and logarithm of m©sr$jghificant bit 

E.ASUM 

Execute and summation ot&itsk * 

E.NAND 

Execute not and * % & 

E.NOR 

' Execute not or 

E.OR 

Execute or " 

E.ORN 

Execute or not ; ^ 3* 

E.SHL 

Execote'shiftTefl - ~ , 

b.SHL.SO 


E.SHR 


b.U.SHR 

|^Pcu# unsigned shtf rtqhl 

E.XNOR 

£xecute.excJusi& nor % 1 - 

E.XOR 

BCecme xor * 

E.GATHER - 

Execute gather " ~ 

E. SCATTER 

Execute & can or 

E. EXPAND 

Exexyfe ^guea evpanri ^ - 

E.U. EXPAND 

Jpxteule ulligfed expand ^ ~ 
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Description 

The contents of registers ra and rb are fetched and the specified operation is 
performed on these operands. The result is placed into register rc. 

Definition 

def Execute(op,ra,rb,rc) as 
a <r- REGfraj 
b <- REG[rb] 
case op of 

E. EXPAND:^ 

c< -ait b4 ° ,| a3i..o l| O b4 ° 
E.U.EXPAND: 

0 " a3i..o " 0 b4 -° % 
E.SHL: fa % £ 

c <- a 63-b 5 .0 .0 " 0 5 C% 
E.SHL.SO: 

if a63..63-b5. 0 * a^^"- 64 1 tne;i 

endif 

E.SHR: . „ " 

c 5 -° » o ^>> % 

E USHR: ^ 
E.ADD 

c sMJW' /I ^ . 

^63 J &{ ^63 » tfr % % 
il t< ■ til 1 

» FfxcdFcntAfrthmetic 



Rxeqp 

endif 

;5 s^ % 

163 0 

E.AND, : 


c^ 

a and b 

E.OR: 


c «- 

a or b 

E.XOR: 


c <- 

a xor b: 

E.ANDN: 


C <- 

a and not b 

E.NAND: 


c <- 

not (a and b) 

E.NOR: 


c <- 

not (a or b) 

E.XNOR: 


c <- 

not (a xor b) 

E.ORN: 


c <- 

a or not b 

E.ALMS: 
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for i <- 0 to 63 

ift63..i = 0 63 'll 1 then 
c<- i 

endif 
endfor 

endif 
E.ASUM: 

t<- a&b 

u f- (t63..1&0x5555555555555555) + (t&0x5555555555^ 
v <- (U 6 3..2&0x3333333333333333> + (u&0x3333333P 
W «- (V 6 3..4&0x707070707070707) + (v&0x0707fj * 
x<- (W 6 3..8&0xf000f000f000f) + (w&OxOOOfOOQf|0^Oti8f) 
C «- X52..48 + X36..32 + *20..16 + ( x&Ox1 0 
E.GATHER: 



- 63 to 0 by -1 
if ~aj then 
q<-bj 
J<-M 

endif 
endfor 


endcase 
REG[rc] c 
enddef 

Exceptions 
Fixed-point arithmetic 
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Execute Copy Immediate 

This operation produces one immediate value, placing the result in a general 
register. 

Operation codes 

j E.COPY.I | Execute copy immediate { 

c , ' *>✓ 

Format 

E.COPY.I ra=imm 

31 24 23 18 17 & #\ 0 

I E.COPY.I | ra ] ~~ I 

A 64-bit immediate value is sign-extended from the IS Lit imm field. The result is 
placed into register ra. #~ 

Definition 

def ExecuteCopylmm* iiatc p ra.tmm) as s - ^ 

i^(imm 17 4, Mr^.m) * . * 

REGfra] <- i 
enddef - x <; ^ "\ 

Exceptions ^ 

none 
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Execute immediate 

These operations perform calculations with one general register value and one 
immediate value, placing the result in a general register. 

Operation codes 


E.ADD.I 

Execute add immediate 

E.ADD.I.SO 

Execute add immediate and check si'cfsied- overflow 

E.AND.! 

Execute and immediate IPfes&T' 

E.NAND.I 

Execute nand immediate 

E.NOR.I 

Execute nor immediate . ~ ^ 

E.OR.I 

Execute or immediate J 

E.XOR.I 

Execute xor iran&diate , > 


class 

operation , ~, ~ ^ n 

eh 

arithmetic 

ADD crfk^ €.J% 2 

NOME SO 

SUB 

I^JONE SO 
tM L UL 
MET % GE UGE 

bitwise 

and or mm nor 
xor 


boolean 

SfcI.E SET L $UUL 
?ET NE SET.GE SET UGE 



AW' 


op rb=r^tomF ^1 
31 .€^^24%P 



18 1? 

at" 


" contents of register ra is fetched, and a 64-bit immediate value is sign- 
extended from the 12-bit imm field. The specified operation is performed on these 
operands. The result is placed into register rc. 

Definition 

def Executelmmediate(op,ra,rb,imm) as 
\<- {imm 11 52 II imm) 

a <- REGfra] „ 

case op of 
E.AND. I: 

b <- a and i 
E.OR.I: 

b <- a or i 
E.NAND.I: 

b <r- a nand i 
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E.NOR.I: 

b <- a nor i 
E.XOR.I: 

b <- a xor i: 
E.ADD.I: 

b <- a + i 
E.ADD.I.SO: 

t <- (a63 I' a) + ('63 » 0 

if *64 5* t63 then 

raise FixedPointArithmetic 

endif 


b<-t63..0 

endcase 
REG[rb] <- b 
enddef 

Exceptions 
Fixed-point arithmetic 


J 


AW 


Highly Confidential 
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Execute Immediate Reversed 

These operations perform calculations with one general register value and one 
immediate value, placing the result in a general register. 

Operation codex 



Description 


The contents of register ra is fetched, and a 64-bit immediate value is sign- 
extended from the 12-bit imm field. The specified operation is performed on these 
operands. The result is placed into register rc. 

Definition 

def ExecutelmmediateCop.ra^b.imm) as 

i<-(immii 52 ll imm) ^— > 

a <- REG[ra] 
case op of 
E.SUB.I: 


Highly Confidential 
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E.SUB.I.SO: 

tf-(i63« 0- (863 "I a) 
if t64 * t63 then 

raise FixedPointArithmetic 

endif 

b <- t63..0 
E.SET.I.E: 

b <- (i = a) 64 

E.SET.I.NE: ^ 

a) 64 

Ese b'i : (i<.)« v >3r 

E.SET.I.GE: 

b <- (i > a) 64 

E.SET.I.UL: | /^C* 

b <- ({0 II i) < (0 II a))6 4 *A #> * w 

E.SET.I.UGE: % r / 

b^{(oiii)>(ona))6 4 *: ^ ^ , : 

E.SUB.I.E: ., 
b «- i-a 

if i * a then "" 1 ' 


raise FixedPoifaAffthmette ; 


endi 
E.SUB.I.NE: 
b <- i - a 

ifi = a*nerr ^ ^'VV 

if i > s thfen . 
i si - qe- ~ J , 

if i - a%en 

railfe FKeaPoin;Arithrr>»$ic 
endir ' % 
V # E.SO&I.UL: ^ 
b«- i - a 

I if (0 II i) > (0 II a) then 

raise FixedPointArithmetic 

endif 
E.SUB.I.UGE: 
b *- i - a 

if (0 II i) < (0 i! a) then 

raise FixedPointArithmetic 

endif 

endcase 
REG[rbj <r- b 
enddef 
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Execute Reversed 

These operations perform calculations with two general register values, placing 
the result in a general register. 

Operation codes 


E.SET.E 

Execute set equal 

E.SET.GE 

Execute set signed qreater or equal 

E.SET.L 

Execute set signed less 

h.SET.NE 

Execute set not equal '% 

E.SET.UGE 

Execute set unsigned greateCbK^qual 

E.SET.UL 

Execute set unsiqned fess-^ <=\ 

E.SUB 

Execute subtract 

b.SUB.E 

Execute subtract and tr--,- q ; 

E.SUB.GE 

Execute smmt ar>d che - 1 drifted greater or equal 

E.SUB.L 

bxecutetesObtre :r ind chec * s[gned less 

E.SUB.NE 

Execute subtract and che( k not equal 

E.SUB.SO 

bxecuts cUbKiitand etek waned overflow 

E.SUB.UGE 

Ki c utc ;ubtra st and cf 3k i isigne&.groater or equal 

E.SUB.UL 

Execute subtract arid or,;;.<.k unjlqntd less 



Descriotinn 

The contents of Registers ra and rb are fetched and the specified operation is 
performed on these operands. The result is placed into register rc. 

Definition 

def ExecuteReversed(op,ra,rb,rc) as / — _____ 

b <r- REG[rb] I . ., , 

a<-REG[ra] ■/ mu 0023287 

3 Op Of 


E.SUB: 
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c <- b - a 
E.SUB.SO: 

tMb63 lf b)-(a63 II a) 
if *64 * { 63 then 

raise FixedPointArithmetic 

endif 

c <- t63..0 
E.SUB.E: 

c^-b-a 
if b * a then 

raise FixedPointArithmetic 

endif 
E.SUB.NE: 
c <- b-a 
if b = a then 

raise FixedPointArithmetic 
endif ~J- 
E.SUB.L: J * 

cf-b-a ' N % 

if b > a then ^ v\ ° 

raise FixedPointArii - ime be 

endif 
E.SUB.GE: 
c <- b- a 

if b < a then* ~ 5 , , 

endif "Cj 
E.SUB.UU'. * <"~^ s * 

C 0 

l!b}>(CUU)then 
aise FixsdPoinlAfiflimetic 

E.SUB.UI, 

it 

raisis 

s*''(b #a) 64 

E ' SET cp(b^a)64 
™ ' E.SET.L: 

c <- (b < a) 64 
E.SET.GE: 

c 4- (b > a) 64 
E.SET.UL: 

c«-((DII b)<(0!l a)) 64 
E.SET.UGE: 

c <- ((0 II b) > (0 II a)) 64 

endcase 
REG[rc] <- c 
enddef 

Exceptions 
Fixed-point arithmetic 
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Execute Short Immediate 

These operations perform calculations with one general register value and one 
immediate value, placing the result in a general register. 

Operation confer 


E.SHLI 

Execute shift left immediate 

E.SHU.SO 

Execute shift left immediate and check signed, overflow - 

E.SHR.I 

Execute signed shift right immediate>-c 

E.U.SHR.I 

Execute unsigned shift right immediate 

E. EXPAND. 1 

Execute signed expand immetfUte " 

E.U.EXPAND.i 

Execute unsigned expand ^m^ediate 


op rc=ra,simm 

31 24 23 



ila 3 i..o"O sifr 


/alue is taken from 
these operands. The 


v m 5 then 
raise Reserved Instruction 

endif 

c f- a |f simm " 
E.U.EXPAND.I: 
ifsimiYi5then 

raise Reserved Instruction 
endif ^ 

c< _ 0 31^simm || a310 H 0 simm 

E.SHL.I: 

C <- £4 3 - s i mm .. 0 II 0 simm 
E.SHL.I.SO: 

"f a63..63-simm * a63 slmm+1 then 
raise RxedPointArithmetic 

endif ' — 

c <r- a£ 3 -simm..O H 0 s ^m H|gh|y Conf } den tj a | 
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E.SHR.I: 

c <- a63 simm I! a 63..simm 


endcase 
REGfrc] <r- c 
enddef 

Exceptions 
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Execute Ternary 

These operations perform calculations with three general register values, placing 
the result in a fourth general register. 

Operation codes 



b <- REG! 
C <- REGL 
case op of v- 

E.MUX: /V « 

d<-<b and a) or (c and„< 

endcase 


REG[rd tS4 % « 
enddef ^ , 
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Floating-point 

These operations perform floating-point arithmetic on two floating-point operands. 

Operation codes 
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op 

prec 

round/trap 

add 

ADD 

16 32 64 128 

none C F N T X 

multiply 

MUL 

16 32 64 128 

none C F N T X 

divide 

DIV 

16 32 64 128 

noneCFNTX 


Format 

F.op.prec. round rc=ra,rb 

31 24 23 18 17 
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Description 

The contents of registers ra and rb are combined using the specified floating-point 
operation. The result is placed in register rc. The operation is rounded using the 
specified rounding option or using round-to-nearest if not specified. If a rounding 
option is specified, the operation raises a floating-point exception if a floating-point 
invalid operation, divide by zero, overflow, or underflow occurs, or when specified, 
if the result is inexact. If a rounding option is not specified, floating-point 
exceptions are not raised, and are handled according to the default rules of IEEE 
754. 

Definition <$%S^ 
def FloatingPointtop.prec.round.ra.rb.rc) as asaf^k ^ 

case prec of ^ " 


32: 


- F16(REG[ra]) 

- F16(REG[rb]) 


a <- F32(REG[ra]) 
b <r- F32(REG[rb]k 


- F64(REGJ 

- F64(Rff|I 


I'm 


a<- F12B(REG[r:.] < '. 

endcase b t^(^?*5£ V 
if roundWMQNE 

if isSigoallfriaNaNia) IreSignaRingNaNj;! 

rais^ FV,5!!ngPoirt£xccption 
endif JLfjT JTlk 
case op of W % % 

: Dtv 

1 1 b=0 then 

: TalseF 
cndif 


"w^case op of 


in If % v 
;e Roat.rtgPointAmhmolfc 


F.ADD: 

c <r- a+b 
F.MUL: 

c <- a*b 
F.DIV.: 

c <- a/b 

endcase 
case round of 


N: 


T: 


C: 

NONE: 
endcase 
case prec of 
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16: 

REG[rc] <- PackF16{c) 

32: 

REG[rc] <- PackF32(c) 
64: ... 

REG[rc] <- PackF64{c) 

128: 

REG[rc] <- PackFI 28(c) 

endcase 
enddef 

Exceptions 
Reserved instruction 

Floating-point arithmetic <^C/V 
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Floating-point Reversed 


These operations perform floating-point arithmetic on two floating-point operands. 


On&mtion codes 


F.SET.E.16 


Floating-point set equal half 


Floating-point set equal half exact 


F.SET.E.32 


Floating-point set equal single 


F.SET.E.32.X 


Floating-point set equal single e^sjtt^ 


F.SET.E.64 


Floating-point set equal doubjsf :: 


Floating-point set equal dou^fe^xlct 


F.SET.E.64.X 



F.SET.L.16.X 


FtOdt}tK?^omt vet less ftatf exact 


[|pa|ng : po!nt sot teos eingie c/;y. 


F.SET.L.64.X 


F.SET.L. 128.X #®Q|FI< 


nj-polnt set Iocs double «,xact 

ng^bint%et less, (jial gxact Ml) 0023296 

[jg-poM^tnot ^ ^ rT " f 
n^po ir, set not y CiLial hait .exact 



, .,jt equal quad exact 
^Tefhot greater or equal half exact 


FilatinB-poir.^-. „ .. ,„ 

Fldal^-poinpget not greater or equal single exacT 


Floating-point set not greater or equal double exact 


USET.NGE. 128.X 


Floating-point set not greater or equal quad exact 
Floating-point set not or less half exact 


F.SET.NL16.X 


F.SET.NL32.X 


Floating-point set not or less single exact 


F.SET.NL.64.X 


Floating-point set not or less double exact 


F.SET.NL.128.X 


F.SET.NUE.16 


Floating-point set not or less quad exact 


Floating-point set not unordered or equal half 


F.SET.NUE.16.X 


Floating-point set not unordered or equal half exact 


F.SET.NUE.32 


Floating-point set not unordered or equal single 


Floating-point set not unordered or equal single exact 


F.SET.NUE.32.X 


Floating-point set not unordered or equal double 


F.SET.NUE.64 


F.SET.NUE.64.X 


Floating-point set not unordered or equal double exact 


Floating-point set not unordered or equal quad 
Floating-point set not unordered or equal quad exact 


F.SET.NUE.128 


F.SET.NUE. 128.X 
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F.SET.NUGE.16 


F.SET.NUGE.32 


_RSET.NUGE.64 


F.SET.NUGE.128" 


F.SET.NUL16 


F.SET.NUL.32 


F.SET.NUL64 


F.SET.NUL128 


F.SET.UE.16 


F.SET.UE.16.X 


F.SET.UE.32 


F.SET.UE.32.X 


F.SET.UE.64 


F.SET.UE.64.X 


F.SET.UE.128 


F.SET.UE. 128.X" 


F.SET.UGE.16 


T%gQB.32.N 


F.SUB.32.T 


F.SUB.32.X 


F.SUB.64 


F.SUB.64.C 


F.SUB.64.F 


F.SUB.64.N 


F.SUB.64.T 


F.SUB.64.X 


F.SUB.128 


F.SUB.128.C 


F.SUB.128.F 
F.SUB.128.N 


Floating-point set not unordered greater or equal half 


Floating-point set not unordered greater or equal single 


Floating-point set not unordered greater or equal double" 


Floating-point set not unordered greater or equal quad 


Floating-point set not unordered or less half 


Floating-point set not unordered or less single 


Floating-point set not unordered or less double 


Floating-point set not unordered or less quad 


Floating-point set greater or equal J?a1 


Floating-point set greater or equal fiaff exa ct 


Floating-point set greater or etfia%pngle 


Floating-point set greater or^gfi%i single exact 


Floating-point set greater. 


laydouble 


Floating-pointy greate r" A .jble exact 


Hoahnn-pPi' t ^ei prea b o> equal quad exact 
Floatingpoint set unordered greater or e 



Floating-point subtract single floor 


Floating-point subtract single nearest 


Floating-point subtract single truncate 


Floating-point subtract single exact 


Floating-point subtract double 


MU 0023297 


Floating-point subtract double ceiling 


Floating-point subtract double floor 


Floating-point subtract double nearest 


Floating-point subtract double truncate 


Floating-point subtract double exact 


Floating-point subtract quad 


Floating-point subtract quad ceiling 


Floating-point subtract quad floor 
Floating-point subtract quad nearest 
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F.SUB.128.T 

Floating-point subtract quad truncate 

F.SUB. 128.X 

Floating-point subtract quad exact 



op 

prec 

round/trap 

set 

SET. 

E . NE 
UE NUE 

16 32 64 128 

noneX 


SET. 

NUGE NUL 
UGE UL 

16 32 64 128 

none 


SET. 

L GE 
NL NGE 

16 32 64 128 



subtract 

SUB 

16 m 64^m 

NQftSfNTX 


F.op.prec.round rc=rb,ra 

31 24 23 ,#%s 



led iBangj^^ecified floating-point 
Xhe^eilltatm is rounded using the 
to-riear>est irtiot specified. If a rounding 
floatingpoint exception if a floating-point 
flow occurs, or when specified, 
is not specified, floating-point 
:ording to the default rules of IEEE 


dlf FloatingPointReversed(op,prec,round,ra,rb ( rc) as 


case prec of 

16: 

a <- F16(REG[ra]) 
b 4- F16(REG[rb]) 

32: 

a <- F32(REG[ra]) 
b <- F32(REG[rb]) 

64: 

a «- F64(REG[ra]) 
b <- F64(REG[rb]) 

128: 

a <- F128(REG[ra] 
b <- F128(REG[rb] 

endcase 
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if isSignallingNaN(a) I isSignallingNaN(b) 
raise FloatingPointException 

endif 

case op of 

F.SET.L, F.SET.GE, F.SET.NL, F.SET.NGE: 
if isNaN(a)lisNaN(b)then 

raise FloatingPointArithmetic 

endif 
others: 
endcase 

endif 

case op of 
F.SUB: 

c <- b-a 
F.SET.NUGE, F.SET.L: 
c <- b??>a 

F.SET.NUL, F.SET.GE: | j 

c <- b!?<a * C 

F.SET.UGE, F.SET.NL: 

c<- b?sa 
F.SET.UL, F.SET.NGE: 

c <- b?<a 
F.SET.UE; 

c <r- b?=a m 
F.SET.NUE: | 

c <- b!?=a#%> 

F.SET.E: 

c <r- b$a 
F.SET.* "~* 



32: 
64: 
128; 
INT: 


REGfrc] <- PackF16(c) 
REG[rc] <r- PackF32(c) 
REG[rc] «- PackF64(c) 
REG[rc] <~ PackFI 28(c) 
REGfrc] <r- c 
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endcase 
enddef 

Exceptions 

Reserved instruction 
Floating-point arithmetic 
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Floating-point Ternary 


These operations perform floating-point arithmetic on three floating-point 
operands.. 

Operation codes 


F.MULADD.16 

Floating-point multiply and add half 

F.MULADD.32 

Floating-point multiply and add single 

F.MULADD.64 

Floating-point multiply and add double ' 

KMULADD.128 

Floating-point multiply and add qiia £ 

KMULSUB.16 
F.MULSUB.32 

Floating-point multiply and sffBfe&ct half 

F.MULSUB.64 

Floating-point multiply and subtract sinqle 
Floating-point jjj&ltiply and subtract Rouble 

F.MULSUB.128 

Floating-poiatmuftiply'anvi ; jptract«quad 





multiply and add A %?± 

MU.LAO.P % ' 

116 32 64 128 

multiply and subtract 


1^132 64 128 


F.operation.typ ; | 

31 



;? ialtiplied together and added to or 
JB§Jp&**&Q- result is placed in register rd. The 
t0 t " e ne "^ st representable floating-point value in a single 
J'-point dpferation. Floating-point exceptions are not raised, and are handled 
>™ing to the default rules of IEEE 754. These instructions cannot select a 
directed rounding mode or trap on inexact. 

Definition 

def FloatingPointTernary(op,ra,rb,rc,rd) as 
case op of 

FMULADD16, FMULSUB16: 
a *- Fl6(REG[ra]) 
b <- F16(REG[rb]) 
c<~ F16(REG[rc]) 
FMULADD32, FMULSUB32: 
a <- F32(REQ[ra]) 

- F32(REG[rb]) 

- F32(REG[rc]) 


FMULADD64, FMULSUB64: Highly Confidential 
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a «- F64{REG[ra]) 
b <- F64(REG[rb]) 
c «- F64(REG[rc]) 
FMULADD128, FMULSUB128: 
a*- F128(REG[ra]) 
b<-F128{REG[rb]) 
c<- F128(REG[rc]) 

endcase 
case op of 

FMULADD16, FMULADD32, FMULADD64, FMULADD128: 
d a*b+c 

FMULSUB16, FMULSUB32, FMULSUB64, FMULSUB128; ; ' . 
d «- a'b-c 

endcase 
case op of 

FMULADD16, FMULSUB16: 
REG[rd] <- PackF16(d) 
FMULADD32, FMULSUB32: 
REG[rd} <- PackF32(d) 
FMULADD64, FMULSUB64: 
REG[rd] <- PackF64(d)„ . 
FMULADD128, FMULSUfc^B' 

REGfrdJ <- PackF^gg^p ~- ™ 
endcase $®\%J? * 

enddef „ C | 

Exce ption s 

Reserved instructidi 

Floating-point arithmetic - -•j^t ^ m #- 
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Floating-point Unary 

These operations perform floating-point arithmetic on one floating-point operand. 

Operation confas 


F.ABS.16 


F.ABS.16.X 


F.ABS.32 


F.ABS.32.X 


F.FL0AT.16.T 


F. FLOAT. 16.X" 


F.FLOAT.32 


F. FLOAT. 32. C 


F.FLOAT.32.F 


F.FLOAT.32.N 


F. FLOAT. 32 .T 


F.FLOAT.32. X 


Fioating-point absolute value half 


Floating-point absolute value half exact 


Floating-point absolute value single % 


Floating-point absolute value single ■exac t" 


Floating-point absolute value j8§L~ 
Floating-point absolute valuCcllfci.bTe exact ■ 
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Floating-point convert half from integer nearest 


Floating-point convert half from integer truncate 


Floating-point convert half from integer exact 


Floating-point convert single from integer 


Floating-point convert single from integer ceiling 


Fioating-point convert single from integer floor 


Floating-point convert single from integer nearest 


ig-point convert single from integer truncate 


F. FLOAT. 64 
F.FLOAT.64.C 


Floating-point convert single from integer exact 


Floating-point convert double from integer 
Floating-point convert double from integer ceiling 
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F.FLOAT.64.F 


Floating-point convert double from integer floor 


F.FLOAT.64.N 


Floating-point convert double from integer nearest 


Floating-point convert double from integer truncate 


F.FLOAT.64.T 


Floating-point convert double from integer exact 


F.FLOAT.64.X 


F. FLOAT. 128 


Floating-point convert quad from integer 


F.INFLATE.16 


Floating-point convert single from half 


Floating-point convert single from half exact 
Floating-point convert double from s|ngle 


F. INFLATE. 16.X 


F. INFLATE. 32 



F.SINK.32.&; 


Floating-point convert integer from single floor 


yfoatin^j-pdfet convert integer from single nearest 


F.SINK.32^ • 


F.SIN^,32.T 


F[patTng-po.-u ^:r.-7 - rt integer from single truncate 


F[Q£;ting-poiii||£orivert integer from single exact 


F.SLNIjfftX , 


NK.64 


Floalfffg-point convert integer from double 


fX : ^64.C ' 


Floating-point convert integer from double ceiling 


R;3!NK.64.F 


Floating-point convert integer from double floor 


F.SINK.64.N 


Floating-point convert integer from double nearest 


F.SINK.64.T 


Floating-point convert integer from double truncate 


Floating-point convert integer from double exact 


F.SINK.64.X 


F.SINK.128 


F.SINK.128.C 


Floating-point convert integer from quad 


Floating-point convert integer from quad ceiling 


F.SINK.128.F 


Floating-point convert integer from quad floor 


F.SINK.128.N 


Floating-point convert integer from quad nearest 
Floating-point convert integer from quad truncate 


F.SINK.128.T 


F.SINK.128.X 


Floating-point convert integer from quad exact 


F.SQR.16 


F.SQR.16.C 


Floating-point square root half 


F.SQR.16.F 


Floating-point square root half ceiling 


Floating-point square root half floor 
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F.SQR16.N 

Floating-point square root .half nearest 

F.SQR.16.T 

Floating-point square root half truncate 

F.SQR.16.X 

Floating-point square root half exact 

F.SQR.32 

Floating-point square root single 

F.SQR.32.C 

Floating-point square root sinqle ceiling 

F.SQR.32. F 

Floating-point square root sinqle floor 

F.SQR.32.N 

Floating-point square root sinqle nearest 

F.SQR.32.T 

Floating-point square root sinqle truncate 

F.SQR.32.X 

Floating-point square root sinqle exac 

F.SQR.64 

Floating-point square root double <^ 

F.SQR.64.C 

Floating-point square root douile^Qeilinq 

F.SQR.64. F 

Floating-point square root diiibfMloor 

F.SQR.64.N 

Floating-point square root double nearest 

F.SQR.64.T 

Floating-point square root double Truncate 

r.oun.o't.A 

Floating-point square root do ible exact 

F.SQR.128 

Floating-point squa-t foot qu&tfW 

F.SQR.128.C 

Floatingpoint square root nuaB celiinq 

FSQR.128.F 

Floating-point oqjate toot quad -looi 

F.SQR.128.N 

F^atfngspoin^sguare rdct quad nearest 

F.SQR.128.T 

boating-point square root quad truncate 

F.SQR.128.X 

^patingj-point square 10 »t q jad exact . . 
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Format 

F.op.prec.round rc=ra 
31 24 23 


F.prec 

ra 

op 

rc 

UNARY. 





round 


Descrip tion 


The contents of register ra is used as the operand of the osp^ecified floating-point 


operation. The result is placed in register rc. The oper| 
specified rounding option or using round-tp-nearest i' 
option is specified, the operation raises a J|iating-p^" 
invalid operation, divide by zero, overf " 
if the result is inexact. If a rounAi^pptJ 
exceptions are not raised, and are handled according v 
754. " ^ ' w 


Definition 


def FloatingPointUnary(oj 
if op = F.FLOAT • 



is rounded using the 
jpecjfied. If a rounding 
if a floating-point 
jf when specified, 
:d, floating-point 
lefault rules of IEEE 


c <- Va 

F.FLOAT, F.SINK, F.fNFLATE, F.DEFLATE: 
c <- a 

endcase 
case op of 

F.ABS, F.NEG, F.SQR, F.FLOAT: 
destprec <- prec 

F.SINK 
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destprec <- INT 
F. INFLATE: 

destprec <- prec + prec 
F.DEFLATE: 

destprec t- prec / 2 

endcase 
case round of 

X: 

N: 


C: 

NONE: 
endcase 

i destprec of 
16: 


REGprc] <- PackF16(c) 
REG[rc] <~ PackF32(c) 
REGfrc] <r- PackF64(( 
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Group 

These operations take two values from a pair of registers, perform operations on 
groups of bits in the operands, and place the concatenated results in a register , 

Operation ppcfeg 


G.ADD.2 


Group add pecks 


Group add nibbles 


G.ADD.4 


G.ADD.8 


Group add bytes 


G.ADD.16 


Group add doublets 


G.ADD.32 


Group add quadlets 


G.ADD.64 


G.AND 10 


Group add octlets 

Group and ^ 



G.DEAL.32 


Group deal quadlets 


G.DIV.64 


Group signed divide octlets 


G. EXPAND. 1 


G. EXPAND. 2 


Group signed expand bits 


G.EXPAND.4 


Group signed expand pecks 


G. EXPAND. 8 


Group signed expand nibbles 


MU 0023308 


Group signed expand bytes 


G.EXPAND.16 


Group signed expand doublets 


G.EXPAND.32 


Group signed expand quadlets 


G.EXPAND.64 


Group signed expand octlet 


10 G.AND does not require a size specification, and is encoded as G.AND.1. 

U G.ANDN does not require a size specification, and is encoded as G.ANDN.l. G.ANDN is 

used as the encoding for G.SET.L.l, and by reversing the operands, for G.SET.UL.l. 
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12 G.GATHER.128 is encoded as G. GATHER. 1 
13 G.MUL.l is used as the encoding for G.UMUL.1. 

"G.NAND does not require a size specification, and is encoded as G.NAND.l. 

j^G.NOR does not require a size specification, and is encoded as G.NOR.1. 

I6 G.OR does not require a size specification, and is encoded as G.OR.1. 

17 G.ORN does not require a size specification, and is encoded as G.ORN.l. G.ORN is used as 

the encoding for G.SET.UGE.l, and by reversing the operands, for G.SET.GE.l 

18 G.SCATTER.128 is encoded as G.SCAXTJSB J ^ 

Highly Confidential : 

For evaluation only -97- microunity confidential 


Terpsichore System Architecture 


REDACTED 


Group signed shift right pecks 


Group signed shift right nibbles 


Group signed shift right bytes 
... dQul . 


G.SHR.8 


Group signed shift right doublets 


G.SHR.16 


Group signed shift right quadlets 


G.SHR.32 


G.SHR.64 


Group signed shift right octlets 


G.SHUFFLE.1 


Group shuffle bits 


G.SHUFFLE.2 


Group shuffle pecks 


Group shuffle nibbles 


G.SHUFFLE.4 


G.SHUFFLE.8 


Group shuffle bytes 


G.SHUFFLE.1 6 


Group shuffle doublets 


Group shuffle quadlets 


G.SHUFFLE.32 


G.SWAP.1 


Group swap bit^ 


G .SWAP. 2 


Group swap oec^ 


G. SWAP. 4 


G.SWAP.8 


Group swap^bblas. 


Group swafrbytes r 


G.SWAP.1 6 


Group &wap ; ,aQiillets ; - : 


G.SWAP.32 


G.U.DIV.64 


lividg 



G.U.EXPAND.64/ 


G r oup unsigned 6*pam 


G.U.MUL2 



%U.SHR.4 


Group unsigned shift right nibbles 


Group unsigned shift right bytes 


G.U.SHRi 


Group unsigned shift right doublets 


G.U.SHR.16 


G.U.SHR.32 


G.U.SHR.64 


Group unsigned shift right quadlets 


G.XNOR 19 


Group unsigned shift right octlets 


Group exclusive-nor 


Group exclusive-or 


U\i 0023310 


G.XOR20 


19 G.XNOR does not require a size specification, and is encoded as G.XNOR.1. G.XNOR is 
used as the encoding for G.SET.E.l. 

20 G.XOR does not require a size specification, and is encoded as G.XOR.1, G.XOR is used as 
the encoding for G.ADD.l, G.SUB.l and G.SET.NE.1. 
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class 

OP - • - • " " £ 

size. 

linear 

ADD 

.248 16 32 64 

bitwise 

AND ANDN NAND NOR 
OR • ORN XNOR XOR 


signed multiply 

MUL 

1 2 4 8 16 32 64 

unsigned 
multiply 

U.MUL 

2 4 8 16 32 64 

signed divide 

DiV 

64 

unsigned 
divide 

U.DIV 


rearrange 

COPY DEAL 
SWAP SHUFFLE 



GATHER SCATTER A 

4 8 16 32 64 

galois field 

POLY 

¥ 2#1 16 32 64 

precision 

COMPRESS B&PANI ) 

n f^PS 16 32 64 

shift 

SHL SHR.- ' 

V32 4 8 16 32 64 


Format 



Description 

Two values ^re^feen^m Ik con „ats of 
operation is. petfotmcd, and the r^ult is pJac * 


iers ra and rb. The specified 



rb.rc} 


royp(op, ^ 
e op of 

G.MUL, G.U.MUL, G.DIV, G.U.DIV: 
a«-REG[ra] 
b <- REG[rb] 

G.ADD, G.SUB, G.SET.L, G.SET.UL, G.SET.E, G.SET.NE, G.SET.GE, G.SET UGE 
G.AND, G.OR, G.XOR, G.ANDN, G.NAND, G.NOR, G.XNOR, G ORN 
G.GATHER, G.SCATTER: ' 

a <- REG[ra] 

b <- REGfrb] 

G.COMPRESS, G.SHL, G.SHR, G.U.SHR, G.POLY: 
' a<-REG[ra] 
b REG[rb] 
G. EXPAND, G.U.EXPAND: 
a <- REGfra] 
b <- REG[rb] 
G.COPY, G.SWAP, G.DEAL, G .SHUFFLE: 
a <- REG[ra] II REG[rb] 
endcase ' 


.1 
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case op of 
G.ADD: 

for i «- 0 to 128-size by size 

Ci+size-1..i <- ai + si 2e -1..i + bj +s i 2e ~i..i 
endfor 
G.MUL: 

for i <- 0 to 64-size by size 

C2*(i+sizeH..2*i «- (a s ize-1 slze H asize-1+i..i) * (b s ize-1 S!ze » b s ize-1+i..i) 
endfor 

G.U.MUL: ^ 
for i <- 0 to 64-size by size 

C 2 '(i + size)-1..2*i <- (0 size II a si2e -i + i..i) * (O^e || ^jfe^ 
endfor jt*\i!!f£ 
G.DIV: ,*rV^" 
if (b = 0) or ( (a = (1II0 63 )) and (b = 1 64 ) ) *nerL%/% 
c <- undefined & ^ ^> 

c <- r 6 3..o li Q63 p % ^ y ^ V ^ % % 
endif * v j^N* 

G.U.DIV: 

if b = 0 then ^ 

c «- undefined * ^ >! 

q<-m?a)/(0trb) 


G.OR: -/V ^""^ ^\ 


G.XOFK. ' , s X 

. ^ c<-aandnotb 
AndI I 1 - 

c not fa and b} - 
> Q.N(Jfci 

c¥- not (a or b) 
G.XNOR: 

c <- not (a xor b) 
G.ORN: 

c <- a or not b 
G.POLY: 

p[0]<-a 

for i <- 1 to size 

pp] «- (P[i-1)0 ? (0 64 H b) : 0128) xor (p p. 1]o || p[i-1] 127 -,) 

endfor 

c <- pfsize] 
G.GATHER: 

for k <- 0 to 128-size by size 

for i <r- k to k+size-1 by 1 
if a\ then 

° J< ~ ' Highly Confidential 
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j<-j + 1 

endif 
endfor 
j+- k+size-1 

for i <- k+size-1 to k by -1 
if ~a\ then 
Cj <- bj 
j <- i - 1 

endif 
endfor 
endfor 
G. SCATTER: 

for k +- 0 to 1 28-size by size 

for i +- k to k+size-1 by 1 

if aj then x ^^l&r^ 

c\<~ bj ^J; W - 

endif ^\,3 %%5x/0 

endfor 

j <- k+size-1 " v 

for i 4- k+size-1 to K by 1 

ff~a,tho„ ..^ 

- <- • t * „~ ^ li. 

; " h 1-1 ' ' 
endif 
ehdfor " >^>-. 
endfor - 1 ' 

G.COMPftESS- 


for > <— 0 to 64'Stzo by me 

'■> Cj^ t ..tv'i <-.a'i-H+Mze-1+(t3&fj.t2e-n) jH+(bS(3ize-1j) 
e ndfoi 

G.EXPAND ? x * - - 

<or * +- 0 to $4,si^ by uze 

fencttot 

^^^^toe^sWslzeX 

jJJV" Vq+i+size+size-IJ+i <- o size ~< b& ( size - 1 ))|l aj +s ize-t.j H 0 b&(si2e - 1) 

G.SHL: 

for i +- 0 to 1 28-size by size 

Ci+size-1..i <~ ai+size-l-(b&(size-1))..i II 0 b& ( s 'ze-1) 
endfor 
G.SHR: 

for i+-0 to 1 28-size by size 
Ci+size-1..i *— 3i+size-1 

&(size-l )|| ai +S i 2 e-i ..i+(b&(size-1)) 

endfor 
G.U.SHR: 

for i <- 0 to 1 28-size by size ( 

Ci+size-1..i *- O b& Isize-l) n aj + sj Z e-1..i+(b&(size-1)) MU 0023313 

endfor 
G.COPY: 
fori 


>r i<-0 to 1 28-size by size ^ ... 

Ci+size-1 ..i *- a size -i ..o Highly Confidential 
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endfor 
G.SWAP: 

for i <- 0 to 128-size by size 

Cj + size-1..i «- ai27-i..128-size-i 
endfor 
G.DEAL: 

for i <- 0 to 128-size by size 
j <- (15. 0 II 0 1 )+(i6 ? size : 0) 
Cj+size-1..i *- ej+size-1..j 
endfor % 
G. SHUFFLE: 

for i 4- 0 to 128-stze by size A %3N* 

i ^- (0 1 II i 6 ..i)+((i&size) ? (64-(0l II size 6 ..i)) J|^f 
Ci+size-1..i <- aj+size-1..j 

endfor . 

endcase 

REG[rc]<h-c ^""% s 
enddef ® € *| 

Exceptions ^> . 

Reserved Instruction ^k&T"' 


v 


For evaluation only 


MU 0023314 

Highly Confidential 

- 102 - microunity confidential 


Terpsichore System Architecture 


REDACTED 


Group Extract Immediate} 

These operations perform calculations with two. general register values and 
small immediate field, placing the result in a third general register . 

Operation rrytew 


G.EXTRACT.1.1 

Group extract immediate bits 

G.EXTRACT.I.2 

Group extract immediate pecks 

G. EXTRACT. I.4 

Group extract immediate nibbles 

G. EXTRACT. 1. 8 

Group extract immediate bytes 

G.EXTRACT.1.1 6 

Group extract immediate doubMis * 

G.EXTRACT.I.32 

Group extract immediate ^tiallefs^. 

G.EXTRACT.I.64 

Group extract 4 jrimediaM#c1iets€# * 

G.EXTRACT.1.1 28 

Group extract immediate hextet 



: etched. The specified 
placed into the register 


size <r~ 8 
shift <- 8 


size *- 4 
shift <- 4 


size <r- 2 
shift 2 


Highly Confidential 
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G.1: 

size <- 1 
shift <- 1 
G.EXTRACT.I: 
case minor of 
0..31: 

size <- 32 
shift <- minor 
32.. 47: 

size <- 16 
shift «- minor - 32 
4S..55: 

size <- 8 

shift <- minor - 48 
56.. 59: 

size <- 4 

shift <- minor - 56 
60..61: 

size 2 

shift <- minor 1 

62: 

size <- i - , 

shift «-0 

63: 

r ir.p H^servedin^tfudiori 
endcase x . £^^4®^ %J$%, |% 

G.EXTRACT.1.6^r v - 
size «- 64 
shift 4- minor 

G tXTPAt'I I \2o UO EnnACT ! 128 

shiit V op' li minor j£ ©J 

endcase ^ . / V ^v, 

for i <- 0 to 1 >R size bv size \ ^ 

Ci+ i: r ! I ♦ 3b sMt+S«sM i+W-shlft 

!f--.r ... - 


1 
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Group Reversed 

These operations take two values from a pair of registers, perform operations on 
groups of bits in the operands, and place the concatenated results in a register. 


Operation anrktB 
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G.SUB.16 

Group subtract doublets 

G.SUB.32 

Group subtract quadiets 

G.SUB.64 

Group subtract octlets 


class 

op 

Size 

linear 

SUB 

2 4 8 16 32 64 

boolean 

SET.E SET.L SET.GE 
SET.NE SET.UL SET.UGE 

2 4 8 16 32 64 



rc=rb,ra 

24 23 


G.size 


12 13- 


OP I 


Description 


Two 

operation 


values are taken from the contents oi registers ra and rb. The specified 
tion i perl ri; n and the result is plac d in registei r< 


Definit ion 

def GroupReversed(op si u ra rb.rc) 

a<-REG{ral >, • 


RSCiirb] 
case op of • 4 ., , \ 

G.SUB: V ^ =#11 ^ 
fot i - 0 to 128-size by ^eo 

Ci +S |« i , « b., i/r, i ;%,aj+ j, 

..G.SU-.t: * < :! 

for t-e- 0 to 128 si - r>y b ^9 

C|*size-1 i%(^ize-1 1% ai+size-l..i) s,2e 
^....rW endfor 
' G.SETUL: 
^ for i <- 0 to 128-size by size 

Ci+size-1..i *- (0 II b +s jze-1..i <0\\ aj +s j ze --|..j) slze 
endfor 
G. SET.E: 

for i <- 0 to 128-size by size 

Ci+size-l..i <- (bi + size-l..i = aj+size-1..i) slze 
endfor 
G.SET.NE: 

for i <- 0 to 128-size by size 

q+size-1..i «- (bj+size-L.t * a'i+size-1..i) slze 
endfor 
G. SET.GE: 

for i <- 0 to 128-size by size 

Ci+size-1..i «- (bi+size-1..i * ai+size-l..i) slze 
endfor 

H '9 h 'yConfidentiai 
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fori-e-Oto 128-size by size 

Ci+size-u <- (0 II b i+SiZe .i j > 0 II a i+s ize-i j)size 
endfor 

lease 
REG[rc]c 

Exceptions 


4. 
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Group Short Immediate 

These operations take two values from a pair of registers, perform operations on 
groups of bits in the operands, and place the concatenated results in a register. 

Operation codes 



G.SHLI.32 


Group shift, feft immediate bytes. 

<\ ^\ immediate d*affi 


Grotip shift 


MU 0023320 


iedi3f§k<^dlets 


G.SHL.1.64 


G.SHR.1.2 


; immadiate"t>ctlets 


shift rj^hiyrnrnediate pecks 


G.SHR.1.4 ^ 


OSHR 


Id^shjMlffi immediate nibbles 


nght immediate bytes 


G-SHj 


p signed shift right immediate doublets 
? hift right immediate quadlets 


Group signed shift right immediate octlets 


Group unsigned expand immediate bits 


G.U.EXPAND.1.2 


Group unsigned expand immediate pecks 


G.U.EXPAND.1.4 


Group unsigned expand immediate nibbles 


G.U.EXPAND.1.8 
G.U.EXPAND.1.16 


Group unsigned expand immediate bytes 


G.U. EXPAND. 1. 32 


Group unsigned expand immediate doublets 


G.U.EXPAND.I.64 


Group unsigned expand immediate quadlets 


Group unsigned expand immediate octlet 


G.U.SHR.I.2 


Group unsigned shift right immediate pecks 


G.U.SHR.I.4 


Group unsigned shift right immediate nibbles 


Group unsigned shift right immediate bytes 


G.U.SHR.I.8 


G.U.SHR.i.16 


Group unsigned shift right immediate doublets 


G.U.SHR.I.32 


Group unsigned shift right immediate quadlets 


G.U.SHR.I.64 


Group unsigned shift right immediate octlets 
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class 

op 

size 

precision 

COMPRESS.I EXPAND.I 
U.EXPAND.l 

1 2 4 8 16 32 64 

shift 

SHLI SHR.I U.SHR.I 

2 4 8 16 32 64 


G.op.size 


rc=ra,simm 

24 23 18 17 


3] 24 23 18 17 12 11 . 5 Q 

I G.sixe | ra | simm | rest Sfr QP I 


6ft Q 


Description 

A 128-bit value is taken from the < 
taken from simm. The specified^ 
register rc. 

This instruction is unci 
simm field is greater o. 



fe second operand is 
the result is placed in 


>n exception if the 


- 0 to 64-size by size 

Ci + i +siZ e + size-i..i + i «- aj S i z s f 2 ^ m II ai +si2e -i.j ■ 0 sil 
endfor 
G.U.EXPAND.I: 

for i <~ 0 to 64-size by size 

CW+size+size-U+i <- 0 size - simm II aj +s ize-1..i » 0 Simm 
endfor 
G.SHLI: 

for i <- 0 to 128-size by size 

Ci+size-1..i «- ai+size-1-simm..jHi 0 simm 
endfor 
G.SHR.I: 

for i «- o to 128-size by size. 

Ci+size-1..i «- ai +S i Z e.i s ' mrn ill aj +S j ze -i..j + simm 

endfor __. 

G.U.SHR.I: 


MU 0023321 
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for i <- 0 to 1 28-size by size 

Ci+size «- 0 s,mm HI a i+s i 2e -i..i +s imm 
endfor 

endcase 

REG[rc] <~ c 

enddef 

Exceptions 
Reserved Instruction 


v 
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Group Ternary 

These operations perform calculations with three general register values, placing 
the result in a fourth general register. 


Operation codes 



Format 
G.op.size 

31 

| op.size 


rd=ra,rb,rc 

24 23 


1 * 


12 11 

~T~ 


2I G.MULADD.l is used as the encoding for G.UMULADD.l 
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Description 

The contents of registers ra, rb, and rc are fetched. The specified operation is 
performed on these operands. The result is placed into register rd. 

Definition 

def GroupTemary(op,si2e J ra,rb,rc 1 rcl) as 
a <- REG[ra] 

b «- REGfrb] <*, 
c <- REGfrc] ' in- 

case op of j>T 
G.MUX: 

d <r- (b and a) or (c andnot a) % 
G.MUX.GATHER: ^Sk^ 
t <- (b and a) or {c andnot a) } /;- 

foVi <- 0 to 127 by 1 J^K ^JV*" f^f/ 

endif ^ '\ , \ ■ 4 & 

end for 

j<-127 " ~< 

fori *- 127-fefrb.f -V* ▼ 

i *- ti 

g.scAtter^J^ ji ^ # 

- %>ri*-Q*tot27 by I 

"I else I 4 €/% 

'" : endfor 
endfor 

^ G. EXTRACT: 

d <- (a II b)( c &i27)+127..(c&127) 
G.MULADD: 

for i «- 0 to 64-size by size 

d2-(i+size)-1..2*i<- C2*(j+size)-1..2*i + 

<a s ize-1 s,ze II asize-1-H..i) * (bsize-1 s,ze « b s ize-1 + i..i) 

endfor 
G.U.MULADD: 

for i <- 0 to 64-size by size 

d 2 *(i+size)-1..2*i «- C2-(i+size)-1..2*r + 

(0 slze II asfce-1+i..r) * (0 s ' ze II bsize-uu) M y Q023324 

endfor " 5 " Wi£ " ' 

endcase 
REGfrd] «- d 

enddef Highly Confidential 
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Exceptions 
Reserved instruction 
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Group Floating-point 

These operations take two values from registers, perform floating-point arithmetic 
on groups of bits in the operands, and place the concatenated results in a register. 
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For evaluation only 


microunity confidential 


Terpsichore System Architecture 


REDACTED 


GF.MUL.16.N 
GF.MUL.16.T 
GF.MUL.16.X 
GF.MUL.32 

Group floating-point multiply half nearest 
Group floating-point multiply halt truncate 
Group floating-point multiply half exact 
Group floating-point multiply single 

GF.MUL32.C 
GF.MUL.32.F 

Group floating-point multiply single ceiling 
Group floating-point multiply single floor 

GF.MUL32.N 
GF.MUL.32.T 
GF.MUL32.X 

(jfOUD flosttnn-nnint rmifttnK/ omnia naoropt 

Group floating-point multiply single truncate 
Group floating-point multiply single e%a^t 

~GF.MUL.64 
GKMUL.64.C 
GF.MUL. 64. F 

Group floating-point multiply doubipw 
Group floating-point multiply M>iM& ceiling 
Group floating-point multip!vttic%ble floor 

GF.MUL64.N 
GF.MUL.64.T 

Group floating-point multi.pf$*i©ubJe nearest 

GF.MUL64.X 

Group floatingpoint rnuinply-doQbtSjruncate 
Group floaUng-pomt mahpi'y double exact 



op 

prec % *V 


add 

ADD ^ % 

16 32, 64" 

W§ne C F N T X 

divide 

DIV 

3& 32 64 ~\ 

*if|JECFNTX 

multiply 

MUL 

ff6~~32~ u4 " ' 

NomCFNTX 



» „ sters ra are^ntined using the specified floating-point 

pult is placed" in register rc. The operation is rounded using the 
luuuuuig option or using round-to-nearest if not specified. If a rounding 
w^tion is specified the operation raises a floating-point exception if a floating-point 
invalid operation, divide by zero, overflow, or underflow occurs, or when specified 
it the result is inexact. If a rounding option is not specified, floating-point 
exceptions are not raised, and are handled according to the default rules of IEEE 

Definition 

def GroupFloatingPointtop.precround.ra.rb.rc) as 
a <- REG[ra] I 
b <- REG[rb] 

for i <- 0 to 128-prec by prec 
ai <r- F{prec,aj +prec -i..j) 

bi <- F(prec,b i+prec . 1 ,j) ' ^ 

if round*NONE then ■ Highly Confidential 
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if isSignatlingNaN(ai) I isSignallingNaN(bi) 
raise FloatingPointException 

endif 

case op of 
F.DIV: 

if bi=0 then 

raise FloatingPointArithmetic 

endif 
others: 
endcase 

endif * 
case op of % V 

GF.ADD: 

ci <- ai+bi ^Jf 
GF.MUL: 

ci <- ai*bi -fesjk 
GF.DIV.: | * 

' r'y 

case op of 

GF.ADD, GF.MUL, GFU< ^ % 

Ci, p,ec-l i<- Pompier, ci) j , 

endcase 

endfor ,\ \ 4gP\ 

endcase 

case round of 


NONE' , " £ % $$%k 

endcase ^ . c v 
if rco then jf^€^ f% V " 

enJit 

REGtrof^e , % ^%JF 

^Ipflrved instruction 
Floating-point arithmetic 
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Group Floating-point Reversed 

These operations take two values from registers, perform floating-point arithmetic 
on groups of bits in the operands, and place the concatenated results in a register. 


Operation codes 



r.NUE.32.X 


GRSET.NUE.64 


GESET.NUE.643T 


GF.SET.NUGE.16 


GF,SET.MUGE~32~ 


GF.SET.NUGE.64 


GF.SET.NUL16 


GF.SET.NUL.32 


GF.SET.NUL.64 


GF.SET.UE.16 


GF.SET. 
GF.SET. 


UE.16.X 
UE.32 


Group floating-point set not unordered or equal single exact 
Group floating-point set not unordered or equal double" 


Group floating-point set not unordered or equal double exact 


Group floating-point set not unordered greater or equal half 


Group floating-point set not unordered greater o r equal single 
(Jroup floating-point set not unordered greater or equal double 


Group floating-point set not unordered or less half 


Group floating-point set not unordered or less single 
Group floating-point set not unordered or less double 


Group floating-point set unordered or equal half 


Group floating-point set unordered or equal half exact 
Group floating-point set unordered or equal single 
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Group floating-point set unordered or equal single exact 


GF.SET-UE.32.X 


Group floating-point set unordered or equal double 


GF.SET.UE.64 


GF.SET.UE.64.X 


Group floating-point set unordered or equal double exact 
Group floating-point set unordered greater or equal half 


GF.SET.UGE.16 


GF.SET.UGE.32 


Group floating-point set unordered greater or equal single 


GF.SET.UGE.64 


Group floating-point set unordered greater or equal double 


GF.SET.UL.16 


Group floating-point set unordered or less half 


GF.SET.UL.32 


Group floating-point set unordered orJess single 
Group floating-point set unordered^rlfeg^ double 


GF.SET.UL64 


Group floating-point subtract hi 


GF.SUB.16 


GF.SUB.16.C 


GF.SUB.16.F 


Group floating-point subtract Jja}f%c%iling 
Group floating-point subtracC.ha1| floor 





prec 

round/trap 


SI 

)t* NUE^ 

^^%64 

noneX I 


1ET 
NUGE NUL 
UGE UL 

16 32 64 

NONE 


SET. 

L GE 
NL NGE 

16 32 64 

X 

subtract 

SUB 

16 32 64 

none C F N T X 


Highly Confidential 


For evaluation only 


-118- 


microunity confidential 


Terpsichore System Architecture 


GF.op.prec.round rc=rb,ra 

31 24 23 18 17 


I GF.prec | 


12 11 


re | op. round "\ 


Description 

The contents of registers ra and rb are combined using the 
operation. The result is placed in register rc. The operai 
specified rounding option or using round-to-nearest if 
option is specified, the operation raises a floating-poi 
invalid operation, divide by zero, overflo^%r undi 
if the result is inexact. If a roundi 
exceptions are not raised, and are 
754. ^ V. 


Definition 

def GroupFloatingPointRevergldl^ 
if rao or rbo then 

raise Reservef|| 

a <- REG[ra] 

for i <r~ 0 \t 
ai<- 

bi <- F(prj 
if roundfclll|toth< 
|!gWlin| 


F.Sp-.GE.W , 
if !sNd$^isNaN(l^then 

raise FloalingPointArlthmetic 

endif 

endcase 

endif 

case op of 
GF.SUB: 

ci <- bi-ai 
GF.SET.NUGE, GF.SET.L: 

ci <r- bi?>ai 
GF.SET.NUL, GF.SET.GE: 

ci <- bi!?<ai 
GF.SET.UGE, GF.SET.NL: 

ci <- bi?>ai 
GF.SET.UL, GF.SET.NGE: 

ci <- bi?<ai 
GF.SET.UE: 

ci <- b?=ai , — ^ 

GF.SET.NUE: 




_;-pomt 
>unded using the 
ied. If a rounding 
:iqn if a floating-point 
.for when specified, 
" ' :, floating-point 
tult rules of IEEE 
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ci <- bi!?=ai 
GF.SET.E: 

ci <- bi=ai 
GF.SET.NE: 

ci <r~ bi*ai 

endcase 
case op of 
GF.SUB: 

Cj+prec-1 i *- PackF(prec, ci) 
GF.SET.NUGE, GF.SET.NUL, GF.SET.UGE, GF.SET.UL, 
GF.SET.L, GF.SET.GE, GF.SET.E, GF.SET.NE, GF.SET.LJE; Gf HP I In 


endcase 
endfor 
endcase 
case round of 

X: 


Cj+prec-1 
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Group Floating-point Ternary 

These operations perform floating-point arithmetic on three groups of floating- 
point operands contained in registers. 

Operation confer 


GF.MULADD.16 

Group floating-point multiply and add half 

GF.MULSUB.16 

Group floating-point multiply and su5%act half 

GF.MULADD.32 

Group floating-point multiply and am$(hqle 

GF.MULSUB.32 

Group floating-point multiply atfd ; t$abtract sinqle 

GF.MULADD.64 

Group floating-point multiply^nikadd double 

GF.MULSUB.64 

Group floating-point multiply and subtract double 



op 

>prec Jsf 

multiply and add 

fctAQ§T%jf 

»#32 64 

multiply and subtract 

MULSUB - 

t6f 32 64 


GF.operation.type 

■ 31 24 § 


rd 



^present a group of floating-point 
g%hd added to or subtracted from 
m the contents of register rc. The 
feter . The results are rounded to the 


Description 

The content., oi lygisteM^dj 
operands m4 painvise are muiti 
the group of floating-point < 

'epr^erljable floatingpoint value in a single floating-point operation" 

^ ^^ic Wcepti u nS arC ° 0t raised ' and are handled according to the default 
rot IEEE 754. These instructions cannot select a directed rounding mode or 
on inexact. 

Definition 

def GroupRoatingPointTernary{op,prec,ra,rb,rc rd) 
a <~ REG[ra 
b <- REG[rb 
c <- REG[rc; 

for i <- 0 to 128-prec by prec 
ai <r- F(prec,ai +prec .i..i) 
bi <- F(prec,bi +prec .i..j) 
ci <- F(prec,ci +prec -i..i) 
case op of 

GF.MULADD: 

. di<-(ai*bi) + ci 


Highly Confidential 


MU 0023333 


For evaluation only 


-121- 


microunity confidential 


Terpsichore System Architecture 


REDACTED 


GF.MULSUB: 

di *— (ai * bi ) - ci 

endcase 

di+prec-L.i*- PackF(prec, di) 
endfor 
REG[rd] <- d 
enddef 

Exception? 

Reserved instruction 
Floating-point arithmetic 


JO " 


MU 0023334 
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Group Floatina-ooint I Jmry mu 0023335 

These operations take one value from a register , perform floating-point arithmetic 
on groups of bits in the operands, and place the concatenated results in a register . 

Option codes 


fSP ARC 1 R 

Group floating-point absolute value half 

RC ADC -Ifi V 

Group floating-point absolute value f%lf .exact 

fiP ARC; go 

Group floating-point absolute value sin'die 

f5P ARC QQ Y 

Group floating-point absolute varue sinqle exact 

■RC ARC RA 

Group floating-point absolute vWut double 

riP ARQ RA Y 

Group floating-point absolute value double exact 

f5tr nPPl atp 0.0 

Group floatingpoint coefrert h ill H >i single 

Ur.UtrLAI h.d/C.U 

Group floatingpoint conv^i naif 1 om single ceiling 

f5P nPPl ATP 00 c 

Group floating poi it c< r vert i all from sinqle floor 

fJC ncci ATC GO Kl 
Ur.UtrLA I t.o^.N 

Group floating-point convert half from single nearest 

GF.DEFLATE.32.T 

Groyp fioalsng-poj tt;converi naif from single truncate 

GF.DEFLATE.32.X 

Group fiojtmqoumt con> an haff from single exact 

GF. DEFLATE. 64 

G/pup floating-point convert single from double 

tjr.UtrLAI t.o4.C 4t 

Group floatgig-poifti convert single from double ceiling 

Gr.DEFLATE^.Fj^ ^ 

:urinjp~lioaHng.po;ru convfert single trom double floor 

GF.DEFLATE.64»^ 

Group floatingpoint convert single from double nearest 

ur.DbhLA 1 b o4\ ! x 

Group floating-point convert single- from double truncate 

bir.UtrLAI c^;%a <# x "| 

%0up Noitirig-poiBl convert single from double exact 

bh. FLOAT. 16 

Group floating-point convert half from integer 

urrLUAl .lo.,U 

Brdup^loating-poirit convert half from integer ceiling 

rzc a oat Hij'tr 
cir.rHJAI.|D% * ^ 

Group floating-point convert half trom integer floor 

ci r . rLU/j fJs!s|JNJ 

<SBroup$oatfng pom convert half from integer nearest j 

'jr.rLUA ( m I 

Gr^ip ftoating-poim convert halt trom integer truncate 

Ri. CI f\ AT if? 

Group Uoatincj-pomt convert half trom integer exact 


G rdl^f ioatin.qjipoint convert sinqle from inteqer 

:t ti.OAT.32V0 

Group floating-point convert single from integer ceiling 

GF:FLOAT.32.F 

Group floating-point convert single trom integer floor 

GF.FLOAT.32.N 

Group floating-point convert single trom integer nearest 

GF.FLOAT.32.T 

Group floating-point convert single trom integer truncate 

GF.FLOAT.32.X 

Group floating-point convert single from integer exact 

GF.FLOAT.64 

Group floating-point convert double from inteqer 

GF.FLOAT.64.C 

Group floating-point convert double trom integer ceiling 

GF.FLOAT.64.F 

Group floating-point convert double from integer floor 

GF.FLOAT.64. N 

Group floating-point convert double trom integer nearest 

GF. FLOAT. 64. T 

Group floating-point convert double from integer truncate 

GF.FLOAT.64.X 

Group floating-point convert double from integer exact 

GF.INFLATE.16 

Group floating-point convert sinqle from half 

GF.INFLATE.16.X 

Group floating-point convert sinqle trom half exact 

GF. INFLATE. 32 

G.roj-io_fjQaima-nQuaLcQcivert double trom single 
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Group fioating-point convert double from single exact 


GF.INFLATE.32.X 


GF.NEG.16 


Group floating-point negate half 


GF.NEG.16.X 
GF.NEG.32 


Group fioating-point negate half exact 


Group fioating-point negate single 


GF.NEG.32.X 


Group fioating-point negate single exact 


Group floating-point negate double 


GF.NEG.64.X 


Group floating-point negate double exact 


Group fioating-point convert integer from half 



GF.SINK.16.C 


Group floating-point convert integer fr< 


GF.S1NK.16.F 


Group floating-point convert intt 


GF.SINK.16.N 


Group floating-point convert integer half nearest 


GF.SINK.16.T 


Group floating-point convert intf 
Group floating-ppint conya 


5l|j[r6m half truncate 
isr from half exact 



GF.SINK.G 


Grourj 1^ f ,c>ao t ;-ulr^rt fntegei I 
Group ^loaing*^ 


from single truncate 


GF.SINK.32.X 


%QUP ipa#g-^i^o^v^fenyiffro 


.from single exact ' 


GF.SINK.64 


from double 


GF.SINK.64.C 


GF.SINK.64.F 


. , Ipatinc^ double ceiling 

Group floating-poi nt i-unvynnteger from double floor 



'Gl- IQR.32.F 


Group floating-point square root single ceiling 


Group floating-point square root single floor 


Group floating-point square root single nearest 


GF.SQR.32.N 


GF.SQR.32.T 


GF.SQR.32.X 


Group floating-point square root single truncate 


Group floating-point square root single exact 


GF.SQR.64 


Group floating-point square root double 


GF.SQR.64. C 


Group floating-point square root double ceiling 


GF.SQR.64.F 


Group floating-point square root double floor 


GF.SQR.64.N 


Group floating-point square root double nearest 


GF.SQR.64.T 


GF.SQR.64.X 


Group floating-point square root double truncate 


Group floating-point square root double exact 


Highly Confidential 


MU 0023336 


For evaluation only 


microunity confidential 


Terpsichore System Architecture 


REDACTED 



°P 

prec 

round/trap 

absolute 
value 

ABS 

16 32 64 

noneX 

float from 
integer 

FLOAT . 

16 32 64 

noneCFNTX. 

integer 
from float 

SINK 

16 32 64 

none C F N T X 

increase 

format 

precision 

INFLATE 

16 32 

noneX 

decrease 

format 

precision 

DEFLATE 

32 64 ^ 


square root 

SQR 

16^.12 64**V* ' 

N^pQFNTX 


GF.op.prec. round rc=i 

31 24 23 Ok 



specified floating-point 
: oplration is rounded using the 
\ if not specified. If a rounding 
lint exception if a floating-point 
:rflow occurs, or when specified, 
s not specified, floating-point 
ding to the default rules of IEEE 


def GroupF!oatingPointUnary(op,prec,round,ra,rb,rc) as 
a <- REG[ra] 
case op of 

GF.ABS, GF.NEG, GF.SQR: 

for i <~ 0 to 128-prec by prec 
ai*- F{prec ( aj +pre c-i..i) 
case op of 
GF.ABS: 

if ai < 0 then 
ci <- -ai 


MU 0023337 


endif 
GF.NEG: 
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ci «- -ai 
GF.SQR: 

ci <r- VaT 

endcase 

Cj+prec-i..i «- PackF(prec, ci, round) 
endfor 
GF.SINK: 

for i <r- 0 to 128-prec by prec 

ai <- F(prec l aj +prec . 1j ) 

c i+prec-1..i^ ai # 
endfor IW^' 
GF.FLOAT: W' 
for i <- 0 to 128-prec by prec <s4L^T 

ai<-ai +p rec-1..i 

Ci+prec-1..i «- PackF(prec,aij round) j. 
endfor . 
GF. INFLATE: 

for i «- 0 to 64-prec by p®i\ 4 1^ f 
ai <r- F(prec,aj + p rfe ^ ) * 
c i+i+prec+ prec-l? 

I+ x <- Pac*F(prec+pr$c . i, r»u( 1» 

endfor 
GF.DEFLATE: _ ; 

fori. 0 to 128-prec by prec 
ai«- Rprrc^prcc t 0 


CB|fcftti',j8t PjickF(prec/2 ai, round) 

endfor - ^ <\i x - V - 0 

REC[rc]^^v, vC .y, r ^ V > >.j£ 
enddef 


Reserved instra^^y* |\ % «i \ 
Floating-poirit arithmetic /5>hN 
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Load 


These operations add the contents of two registers to produce a virtual address, 
load data from memory, sign- or zero-extending the data to fill the destination 
register. 

Operation codept 



>ig-endian ordering, nor between aligned 


i betd 

— - ^ iV ingle D^^Ioaded^| ra 

?need n%^tinguxsh between signed and unsigned, as the ocdet fills the destination 


I.B.A need not distinguish between signed and unsigned, as the ocdet fills the destination 
register. 

^M.L need not distinguish between signed and unsigned, as the octlet fills the destination 
register. 

26 L.64.L.A need not distinguish between signed and unsigned, as the ocdet fills the destination 
register. 

27 L.128.B need not distinguish between signed and unsigned, as the hexlet fills the destination 
register pair. 

28 L.128.B.A need not distinguish between signed and unsigned, as the hexlet fills the 
destination register pair. 

9 L.128.L need not distinguish between signed and unsigned, as the hexlet fills the destination 
register pair. 

30 L.128.L.A need not distinguish between signed and unsigned, as the hexlet fills the 
destination register pair. 

31 L.U8 need not distinguish between little-en dian and big-endi an ordering, nor between aligned 
and unaligned, as only a single byte is loaded.' ~> 
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L.U.16.LA 

Load unsigned doublet little-endian aligned 

L.U.32.B 

Load unsigned quadlet big-endian 

L.U.O/i.D.A 

Load unsigned quadlet big-endian aligned 

LU.32.L 

Load unsigned quadlet little-endian 

LU.32.L.A 

Load unsigned quadlet little-endian aligned 

LU.64.B 

Load unsigned octlet big-endian 

LU.64.B.A 

Load unsigned octlet big-endian aligned 

LU.64.L 

Load unsigned octlet little-endian 

LU.64.LA 

Load unsigned octlet little-endian ..alii^e-d 


number format 


type 


^dtgjlg alignment 


signed byte 


unsigned byte 


signed integer 


m, 32 


signed integer aligned 


16 


unsigned integer 
unsigned integer aligned 


register 


register aligned 


- ■- 128 


3 B |A 


Format 
op r 


• ; - ... r« 


8 ^*V" ^€ v jjf*^ 6 6 6 



. jddrcbS ss tcmpued liom the mm of the contents of register ra and 
jPr^8§]pNJents ti^gxMoty i^Ejg the specified byte order is treated as 
spedl^djlbnd zero-extended or sign-extended as specified, and placed into 

If alignment is specified, the computed virtual address must be aligned, that is, it 
must be an exact multiple of the size expressed in bytes. If the address is not 
aligned an "access disallowed by virtual address" exception occurs. 

Definition 

def Load(op.ra J rb,rc) as MU 0023340 

case op of 

L16L, L32L, L8, L16LA, L32LA, L16B, L32B, L16BA, L32BA, 
L64L, L64LA, L64B, L64BA: 
signed «- true 

LU16L, LU32L, LU8, LU16LA, LU32LA, LU16B, LU32B, LU16BA, LU32BA, 
LU64L, LU64LA, LU64B, LU64BA: 

signed <- false 
L128L, L128LA, L128B, L128BA: 
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signed <- undefined ' ' " ' 

endcase 
case op of 
L8, LU8: 
size <~ 8 

L16L, LU16L, L16LA, LU16LA, L16B, LU16B, L16BA', LU16BA: 
size <- 1 6 

L32L, LU32L, L32LA, LU32LA, L32B, LU32B, L32BA, LU32BA- 
size <- 32 

L64L, LU64L, L64LA, LU64LA, L64B, LU64B, L64BA, LU64BA: 

size <- 64 
L128L, L128LA, L128B, L128BA: 
size <~ 128 

endcase 
case op of 

L16L, LU16L, L32L, LU32L, L64L, LU64L, L128M 
L16LA, LU16LA, L32LA, LU32LA, L6lLA, LU ^ 

order *- L 
L16B, LU16B, L32B, LU32B, 
L16BA, LU16BA, L32BA, LU3J 

order <- B 
L8, LU8: 

order <- undefine< 

endcase 
case op of 

L16L, LU16L, fe£i!^«33^L'P 
L16E LU16B 32B LU '.1 " 

align # fate 
L16LA, kW%$ L34.A,iU3S 
I1S.SA ...-JioBA L 



Exceptions 

Reserved instruction 

Access disallowed by virtual address 

Access disallowed by tag 

Access disallowed by global TLB 

Access disallowed by local TLB 

Access detail required by tag 

Access detail required by local TLB 

Access detail required by global TLB 

Cache coherence intervention required by tag 

Cache coherence intervention required by local TLB 


MU 0023341 
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Cache coherence intervention required by global TLB 
Local TLB miss 
Global TLB miss 


IP' 


I:. 1 V ' 
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Load Imm&diatG 

These operations add the contents of a register to. a sign-extended immediate 
value to produce a virtual address, load data from memory, sign- or zero-extending 
the data to fill the destination register. 

Operation codas 


MU 0023343 


L.8.132 


L16.BAI 


L16.B.I 


L.32.L.A.I 


L32.LI 


Load signed byte immediate 


Load signed doublet big-endian 


Load signed doublet big-endiaft %rf e diate 


Load signed doublet iittle-ehdi 


immediate 


iffed immediate 


Load signed doublet littla-endian immediate 


Load signed qujdlet biq ~j ; a , iiigned immediate 


Load signedjguaf iek 


ediate 


L oad signer* quadiei IrUle^idl aTriugn ed immedia te" 
Lodo s i qnyf gjariig hrlf- e nc tan immediate 



* big-endian ordering, nor between aligned 

poaded.^%, 

i signed and unsigned, as the octlet fills the 

not distinguish between signed and unsigned, as the octlet fills the destination 

l ^L.64.L.A.I need not distinguish between signed and unsigned, as the octlet fills the 
destination register. 

36 L.64.L.I need not distinguish between signed and unsigned, as the octlet fills the destination 
register. 

37 L.128.B.A.I need not distinguish between signed and unsigned, as the hexlet fills the 
destination register pair. 

?8 L.128.B.I need not distinguish between signed and unsigned, as the hexlet fills the destination 
register pair. 

39 L.128.L.A.I need not distinguish between signed and unsigned, as the hexlet fills the 
destination register pair. 

40 L.128.L.I need not distinguish between signed and unsigned, as the hexlet fills the destination 
register pair. 

41 L.U8.I need not distinguish between little-endian and big-endian ordering, nor between 
aligned and unaligned, as only a single byte is loaded. 
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L.U.16.LI 

Load unsigned doublet little-endian immediate 

LU.32.B.A.I 

Load unsigned quadlet big-endian aligned immediate 

L.U.32.B.I 

Load unsigned quadlet big-endian immediate 

LU.32.LA.I 

Load unsigned quadlet little-endian aligned immediate 

L.U.32.LI 

Load unsigned quadlet little-endian immediate 

LU.64.B.A.I 

Load unsigned octlet big-endian aligned immediate 

L.U.64.B.I 

Load unsigned ootlet big-endian immediate 

LU.64.LA.I 

Load unsigned octlet little-endian aligned immediate 

LU.64.Ll 

Load unsigned octlet little-endian im%e$iate 


number format 

type 

size 


alignment 

signed byte 


8 4 



unsigned byte 

U 




signed integer 


32 



signed integer aligned 




A 

unsigned integer 

U 

W»#4,,/ 

^ B 


unsigned integer aligned 


32 64- • 

M B 

A 

register %, 



La. B 


register aligned 


%2B k 

W% B 

A 


Format 
op 


rb=ra,offset 

24 23 1.0 17 


•4 23 

"1 


offset 


Description ~ . 

A virtu .d address js computed- khm the smh of the contents of register ra and the 
sign-e5@tei^.ed value $f the ^|fset|GelH. .J^he^pritents of memory using the specified 
byte ordie* is treated as the-'ske^pecifled and zero-extended or sign-extended as 
specified, and placed into register rb. 

^^fignment is specified, the computed virtual address must be aligned, that is, it 
must be an exact multiple of the size expressed in bytes. If the address is not 
aligned an "access disallowed by virtual address" exception occurs. 

Definition 

def Loadlmmediate(op,ra,rb,offset) as 
case op of 

L16LI, L32LI, L8I, L16LAI, L32LAI, L16BI, L32BI, L16BAI, L32BA1: 
L64LI, L64LA1, L64BI, L64BAI: 

signed <- true 
LU16LI, LU32LI, LU8I, LU16LAI, LU32LAI, 
LU16BI, LU32BI, LU16BAI, LU32BAI: 


highly Confidential 


LU64LL LU64LAI, LU64BI. LU64BAI: 

signed <- false 
L128LI. L128LAI, L128BI, L128BAI: 
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signed <- undefined 

endcase 
case op of 

L8I, LU8I: 

size <- 8 '*<-3? 
L16LI, LU16LI, L16LM LU16LAI, L16BI, LU16BI, L16BAI, LU16BAI- 
size <- 16 

L32LI, LU32LI, L32LAI, LU32LAI, L32BI, LU32BI, L32BAI, LU32BAI' 
size <- 32 

L64LI, LU64LI, L64LAI, LU64LAI, L64BI, LU64BI, L64BAI, LU64BAI: 

size <- 64 
L128LI, L128LAI, L128BI, L128BAI: 

size <- 128 

endcase 
case op of 

L16L1, LU16L1, L32LI, LU32LI, L64LI, LU64LI, 
L16BI, LU16BI, L32BI, LU32BI, L64&I, LU64B 
align <■ ' ' 

L16LAI, LU16LA1, L32LAI, LU^Ai, L.64t:Ai I 64 A 
L16BAI, LU16BAI, L32BAI, b&P 

align <- true 
L8I, LU8I: 

align <- undefined! 

endcase 
case op of 

L16LI, LUIGLI, L$at, IU321J 1S4LI XU^I I, LIP 
L16LAI, LUjep^ff l32LAf, LI 132I Al L64LAI 
order* I % 

L16BAI lv' ( ~ 1 ' " *' L ^ 4BI LU 

L8I, Cf 
ordi 

endcase 

VirtAddr^Gfrai Koffs©^ ! 52 tt effect) 
if align 




- Load!%mory(VirtAddr,size ( order) 

- (bsize-1 and signed) 1 28-size N b 
REG[rbj <r- bx 

enddef 

Exceptions 

Reserved instruction 

Access disallowed by virtual address 

Access disallowed by tag 

Access disallowed by global TLB 

Access disallowed by local TLB 

Access detail required by tag 

Access detail required by local TLB 

Access detail required by global TLB 

Cache coherence intervention required by tag 

Cache coherence intervention required by local TLB 
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Cache coherence intervention required by global TLB 
Local TLB miss 
Global TLB miss 


.*:> V " #4,. , : , : „. "™>S%, .:*%.. 


MU 0023346 
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Store 

These operations add the contents of two registers to produce a virtual address, 
and store the contents of a register into memory. 

Operation cod&s 




ordering 

alignment 

8 



16 32 64 128 

L B 


16 32 64 128 

L B 

A 


42 S.8 need not specify byte ordering, nor need it specify alignment checking, as it stores a single 
byte 


MU 0023347 
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Format 

op ra.rb.rc 

31 24 23 18 17 12 11 

| S. MINOR | ra | rb | 


6 5 

zn 


o 

~=p~~i 


Description 

A virtual address is computed from the sum of the contejits| 
register rb. The contents of register rc, treated as the s " 
memory using the specified byte order. 

If alignment is specified, the computed } 
must be an exact multiple of the size|^ m 
aligned an "access disallowed by virt|fti%ddr( 


Definition 

def Store(op,ra,rb,rc) as 
case op of 
S8, 

S16L, S16LA, ■ 
S32L, S32LAf.5 
S64L, S64I™ 
S128L,/' 



:gister ra and 
ied, is stored in 


iligned, that is, it 
tl address is not 


size <- 

S16L, S16LA, S16B, S16BA: 

size*- 16 
S32L, S32LA, S32B. S32BA: 

size <- 32 
S64L, S64LA, S64B, S64BA, 
SAAS64BA, SAAS64LA: 
size <- 64 

SCAS64BA, SCAS64LA, SMAS64BA, SMAS64LA, SMUX64BA, SMUX64LA: 

size 64 
S128L, S128LA, S128B, S128BA: 

size <r- 128 

endcase 
case op of 

S8: M U 0023348 

align <- undefined 
S16L, S32L, S64L, S128L, H|flMy confidential 
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S16B, S32B, S64B, S128B: 

align <- false 
S16LA, S32LA, S64LA, S128LA, 
S16BA, S32BA, S64BA, S128BA, 
SAAS64BA, SAAS64LA, SCAS64BA, SCAS64LA, 
SMAS64BA, SMAS64LA, SMUX64BA, SMUX64LA: 

align <- true 

endcase 
case op of 
S8: 

order <- undefined 
S16L, S32L, S64L, S128L, 
S16LA, S32LA, S64LA, S128LA 
SAAS64LA, SCAS64LA, SMAS64LA, SMUX64LAI: 

order «- L 
S16B, S32B, S64B, S128B, 
S16BA, S32BA, S64BA, S128BA, 
■ SAAS64BA, SCAS64BA, SMAS< 

order <- B 

VirtAddr <- REG[ra] + REG[rb] 
if align then 

if (VirtAddr and- ((size/f 
raise Accessr 

endif 

endif 

m <- REG[rc] 
case function of 
NONE: 


m 127..64) 



c <- LoadMemory(VirtAddr,size,order) 
n «- (m-|27..64 & ni 6 3..o) I (c & ~rri63..o) 
StoreMemory(VirtAddr,size,order,n) 
REG[rc] <- c 
MUX: 

c <- LoadMemory(VirtAddr,size,order) 
n -f- (m 12 7..64 & t>63..o) I (c & ~m 63 o) 
StoreMemory{VirtAddr,size,order,n) 

endcase 
enddef 

Exceptions 

Reserved instruction 

Access disallowed by virtual address 

Access disallowed by tag 
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Access disallowed by global TLB 

Access disallowed by local TLB 

Access detail required by tag 

Access detail required by local TLB 

Access detail required by global TLB 

Cache coherence intervention required by tag 

Cache coherence intervention required by local TLB 

Cache coherence intervention required by global TLB 
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Store Immediate 


These operations add the contents of a register to a sign-extended immediate 
value to produce a virtual address, and store the contents of a registerinto 
memory. 

Operation codes 

S.8.143 



size 

ordering 

alignment 

8 



16 32 64 128 

L B 


16 32 64 128 

L B 

A 


45 S.8.I need not specify byte ordering, nor need it specify alignment checking, 
single byte. 


as it stores a 
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Format 

S.size.order.align.l 


r 


24 23 

~r 


ra.rb, offset 

18 17 


offset 


Description 

A virtual address is computed from the sum of the contents^ 
sign-extended vale of the offset field. The contents of re^ ' 
size specified, is stored in memory using the specified b)g' 

If alignment is specified, the computed yj^ial adi 
must be an exact multiple of the size/elp%es! 
aligned an "access disallowed by virtj|i%ac!di 


Definition 

def Storelmmediate(op, ra.rb.offsj 
case op of 

S16LI, S16LAJ^®TS-^^ 
S32LI, S3°LAf ( S?2BI, S32BA*, 
S64LI, 364LAI, S64B^| f 
r,Ubt,l S128LA! - 


ra and the 
b, treated as the 




SAA! 

furfctk 

SCAS64BAI SCAS64LAi% j» 
>nction<- CAS * 
Ma »C4f /-I ' MAGt:4l M 
ftnct.on +~ MAG 
3^tfX$4BA| $MUX§4LAf~ 
funcTorr <- MUX 

size <r- 8 
S16LI, S16LAI, S16BI, S16BAI: 

size <- 16 
S32LI, S32LAI, S32BI, S32BAI: 

Size <- 32 
S64LI, S64LAI, S64BI, S64BAI, 
SAAS64BAI, SAAS64LAI: 

SCAS64BAI, SCAS64LAI, SMAS64BAI, SMAS64LAI, SMUX64BAI, SMXU64LAI: 

size <- 64 
S128LI, S128LAI, S128BI, S128BAI: 
size <- 128 

endcase 0023352 


case op of 
S8I: 

align <- undefined 
S16LI, S32LI, S64LI, S128L1, . . 

S16BI, S32BI, S64BI, S128BI: Highly Confidential 
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align <- false 
S16LAI, S32LAI, S64LAI, S128LAI, 
S16BAI, S32BAI, S64BAI, S128BAI, 
SAAS64BAI, SAAS64LAI, SCAS64BAI, SCAS64LAI, 
SMAS64BAI, SMAS64LAI, SMUX64BAI, SMUX64LAI: 

align <- true 

endcase 
case op of 
S8I: 

order <- undefined 
S16LI, S32LI, S64LI, S128L!, 
S16LAI, S32LAI, S64LAI, S128LAI, 
SAAS64LAI, SCAS64LAI, SMAS64LA1, SMUX64LAI: 

order <- L 
S16BI, S32BI, S64BI, S128BI, 
S16BAI, S32BAI, S64BAI, S128BAI, 
SAAS64BAI, SCAS64BAI, SMAS64R 

order < 

endcase 

VirtAddr <- REG[ra] + (offset-) 1 50 II 
if align then mZ%S 
if (VirtAddr and ((size/8£- 
raise AccessDisaf 

endif 

endif 

m <- REG[rb] 
case function of 
NONE: 


ter,m 12 7..64) 



- LoadMemory(VirtAddr,size,order) 
n <- (rrti27..64 & rri63..o) I (b & ~rri63..o) 
StoreMemory(VirtAddr,size > order,nj 
REG[rb]<-b 

X: 

b <~ LoadMemory(VirtAddr,size,order) 
n <- (m-j27..64 & m 63 .. 0 ) I (b & ~m63..o) 
StoreMemory(VirtAddr,size,order,n) 


endcase 
enddef 

Exceptions 

Reserved instruction 

Access disallowed by virtual address 

Access disallowed by tag , 

Access disallowed by global TLB 


MU 0023353 


Highly Confidential 


For evaluation only 


-141- 


microunity confidential 


Terpsichore System Architecture 


REDACTED 


Access disallowed by local TLB 

Access detail required by tag 

Access detail required by local TLB 

Access detail required by global TLB 

Cache coherence intervention required by tag 

Cache coherence intervention required by local TLB 

Cache coherence intervention required by global TLB 

Local TLB miss 

Global TLB miss 


. ; y/'v- '7.;;:,: 


5%. ' 
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Memory Management 

This section discusses the caches, the translation mechanisms, the memory 
interfaces, and how the multiprocessor interface is used to maintain cache 
coherence. 

The Terpsichore processor provides for both local and global virtual addressing, 
arbitrary page sizes, and coherent-cache multiprocessors. The memory 
management system is designed to provide the requirements fd&yuiujlementation 
of virtual machines as well as virtual memory. 


All facilities of the memory management system are t__ 
in order to provide for the manipulation of these facjlj 
compiled code. 

The translation mechanism is design 
access to the virtual address space^f^ ' 

Privilege levels provide for 
secure system facilities, 
field in the access info] 
most-privileged level. 


memory mapped, 
high-level language, 


isecure user code and 
flpecified by a two-bit 
%yel, and three is the 



Physical address 


protection 
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I 

Starting from a local virtual address, the memory management system performs 
three actions in parallel: the low-order bits of the virtual address are used to 
directly access the data in the cache, a low-order bit field is used to access the 
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cache tag, and the high-order bits of the virtual address are translated from a local 
address space to a global virtual address space. 

Following these three actions, operations vary depending upon the cache 
implementation. The cache tag may contain either a physical address and access 
control information (a physically-tagged cache), or may contain a global virtual 
address and global protection information (a virtually-tagged cache). 

For a physically-tagged cache, the global virtual address is translated to a physical 
address by the TLB, which generates global protection infor^all^f The cache 
tag is checked against the physical address, to determine a^allpNait. In parallel, 
the local and global protection information is checked. 



For a virtually-tagged cache, the cache taj 
address, to determine a cache hit, and t" 
is checked. If the cache misses, the* ' 
physical address by the TLB, w%cl%als< 
information. 


Local and GlobaL^jdi 

The 64 -bit global virtj 
environment, requites 
such as the UNI^^ 
child tasks, ea< 
switching tas* 
task's access 


the global virtual 
tion information 
translated to a 
global protection 


isks. In a multitask 
itParise from operations 
plicated into parent and 
In addition, when 
lust t>#|disabled and another 


Jhe address space to be made local to 
s global virtual space specified by four 16- 
: each local-virtual spate. Te|pii|Kbre specifies four sets of virtual 
i lour sets o'£the#e J^u^r-egisters. The registers specify a mask 
fhe htgh-or^der 3<S#address bits are checked to match a 
I if theyvpatch, a ^alue with which to modify the virtual 
r Terpsichore avoids setting a fixed page size or local address size; these 
be set by software conventions. 

A local virtual address space is specified by the following:: 


field name 

size 

description 

local mask 

16 

mask to select fields of local virtual 
address to perform match over 

local match 

16 

value to perform match with masked 
local virtual address 

local xor 

16 

value to xor with local virtual address if 
matched 

local protect 

16 

local protection field (detailed later) 


local virtual address space specifiers 
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These 16-bit registers are packed together into a 64-bit register as follows: 


Local Translation I nn kasidB Buffer 
63 48 47 32 


6 15 0 
P™tect[t][i] 1 


3 z f 

The LTLB contains a separate context of register sets for each 
consists of one or more sets of mask/match/xor/protect registt 
simultaneously accessible local virtual address range. This 
the "Local TLB context," or LTLB (Local Translation 
The effect of this mechanism is to provide the faeilit 
segmentation. However, in this system there is no 
instead, segments are local nicknames foj§portioj 
space. 



:ead. A context 
set for each 
s is called 
Buffer) context, 
attributed to 
the address range, 
iaj virtual address 


Forinstructions executing at thej^ 
failure to match a LTLB em 
handled by loading an LT 
access to an arbitrary ni 

Instructions executinj 
access any region 
and may acess 
matches. Thi 
virtual addre! 
manipulate the^coni 
privileged client^ 


level 0 or level 1), a 
exception may be 
tion, thus providing 


r el 2 or level 3) may 
LTLB entry matches, 
hen no LTLB entry 
L licious use of local 
privileged code may 
lge on behalf of a less- 


ontext is a single set of 
single-set LTLB context may be 
lentation of the mask and match 

16 15 


If the largest possible space is reserved for an address space identifier, the virtual 
address is partitioned as shown below. Any of the bits marked as "local" below 
may be used as "offset" as desired. 


local 


48 47 

zn 


offset 
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def GlobalVA.LocaiProtect <- LocalVirtualToGIobalVirtualAddressTranslationfth.va pi) as 
LocalTLBMatch <- NONE — _____ 
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for i <- 0 to «sets per thread»-1 

if (va 6 3..48 & ~LocalTLB[th][i] 6 3..48) = LocalTLB[th][i] 4 7.. 3 2 then 
LocalTLBMatch <- i 

endif 
endfor 

if LocalTLBMatch = NONE then 
if pi < 2 then 

raise LocalTLBMiss 

endif 

GlobalVA <- va 
LocalProtect <- 0 

else 

GlobalVA <- {va63..48 A LocalTLB[th][LocalTLBMatch] 3; ^ 
LocalProtect <- LocalTLB[th][LocalTLBMatch]i5. c 

endif 
enddef 


Global Virtual Cache 

The innermost levels of the ins] 
indexed and matched entirely 
block of memory data is taj 
bits of the global virtuaL 
kilobytes; foi 



ire direct-mapped and 
Consequently, each 
^ « and the high-order 
^n%virtual caches is 32 
8 kilobytes and a 
>f virtual addresses to 
it either the associated 
ie 16% order 20 bits of any 
size of the maximum 1M 

. & 

i buffer <|nemo%y (described below). It is 
Virtual tag must match, and the 
Ifcache miss or exception occurs, 
lock consists of 64 bytes, so for a 
tag information for each cache. The 
concatenation of the access, state and control 


global virtual tag 


1 P rotect 1 


The following function reads the data, tag, and protection bits from either the 
instruction (c=0) or data (c=l) cache, given a local virtual address. 

def data,GlobalVA,G!obalProtect <- ReadCache{c,va,size) as 
data <~ cacheDataArray[c][vai4..4l 

GlobalVA <- cacheTagArray[c][va-i4..6l63..i6 H vai5,.o |\flU 0023358 

GlobalProtect <- cacheTagArray[c][vai4..6li5..0 
enddef 
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Translation and Protection 

Global virtual addresses are translated to physical addresses only upon misses in 
the virtual caches. The translation is performed by software-programmable 
routines, augmented by a hardware TLB, specifically, the global TLB. The global 
TLB labels a cache line with the physical and access information in the virtual 
cache tag. The global TLB contains a minimum of 64 entries and a maximum of 
256 entries. 



' MU 0023359 
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A local TLB, global TLB or virtual cache entry contains the following information. 
The figures in parentheses are the actual size of the field contained, if only a sub- 
field is held in the entry. 



information in local TLB, global TLB, or virtual cache entry 
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The bottom section of the 5 table above indicates "the contents of the 16-bit 
protection field: 

Protection i nformation in local TLQ 

15 14 13 12 11 10 87 fifi 4 3 5>1 n 


nznn 


Protection information in global 77 R 
15 14 13 12 11 10 


4 A t#l 


|rv| cc |da|ao| £g "g r | w^^g j g [ 

1 2 11 3 2 2% 2 2 


. Protection information in instri jctinn^^ ej4&f^ 
13 12 11 10 - & J W 

r^fdalaol ^ \ -J— y 



fed harp$€re mechanisms are provided to fetch and store back data blocks 
- instruction and data caches, provided that a matching entry can be found in 
: global TLB. When no entry is to be found in the global TLB, an exception 
handler is invoked either to generate the required information from the virtual 
address, or to place an entry in the global TLB to provide for automatic handling 
of this and other similarly addressed data blocks. 

The initial implementation of Terpsichore partitions the remainder of the local 
memory system, including a second-level joint physical cache and a DRAM-based 
memory array, into a set of separate devices, called Mnemosyne. These devices 
are accessed via a high-bandwidth, byte-wide packet interface, called Hermes, 
which is largely transparent to the Terpsichore architecture. The Mnemosyne 
devices provide single-bit correction, double-bit detection ECC on all local 
memory accesses, and a check byte protects all packet interface transfers. The 


r 
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size of the secondary cache and the DRAM memory array is implementation- 
dependent. 

Cache Coherence 

The Terpsichore processor is intended for use in either a uniprocessor or 
multiprocessor environment- At the high performance level intended for 
Terpsichore, mechanisms employed in other processor designs to maintain cache 
coherence, such as "snoopy cache" buses, cannot be effectively l%jlt.»because the 
communication latency and bandwidth between processors ■#&^<lnDe excessive. 
Several cache coherence mechanisms have been designe^,1fcr^^h-performance 
processors that do not require that all memory transactions Ire broadcast among 
the processors in a system, one of which is the Scalable Coherent Interface, or 
SCI, as specified by the IEEE Standard 15%-1992.^ 

The SCI cache coherence mechanisj 




coherence operations take time 
in the operation, and so implicit 
particular data item will tend to" 
theoretically handle (SCI is n) 
that may assure logar^' 
complete working p: 
benchmarked at th< 

It is considei 
implemented 
of a cache i 
performance of 
can be achiev< 
Terpsichore, 
experim< 


[any of the cache 
processors involved 
of processors sharing a 
ocessors that SCI can 
complex mechanism 
\pst importantly, no 
built, tested and 
% architecture. 


should not be 
''are implementation 
^ ; en the high intrinsic 
ie performance level that 
ites? advantage, though, is that 
" improve the performance of 
iche^raherj^^^'perations. 

£||njpnmatiin isftnttiUHtf Ifethin the local TLB, global TLB and 
sc that c( herence oj crati jn may be performed at task-level, page- 
ick-level as desired. This flexibility provides for a coherent view 


semory in multiprocessing systems with varying degrees of coupling. 

Physical Addresses 


MU 0023362 


Physical addresses are 64 bits in size, consisting of a 16 bit processor node number 
and a 48 bit address. 


LocalAddress 


Physical addresses in which the node number is zero reference the local 
processors local memory space, providing access to local memory, cache tags, 
system and interface facilities. Physical addresses in which the node number is 


For evaluation only -150- microunity confidential 

Hinhlw Hmifirfential 


Terpsichore System Architecture. , 


REDACTED 


nonzero reference other processors' local memory spaces, using the Hydra 
interface for communication. 

The local memory environment of Terpsichore involves the use of up to twelve 
Hermes byte-wide packet communications channels, by which Terpsichore can 
request read or write transactions to Mnemosyne, Calliope, and Hydra devices. In 
addition, Terpsichore can issue read or write transactions to the Cerberus serial 
bus interface, via which the Mnemosyne, Calliope, Hydra and other devices' 
configuration and control registers can be accessed. The diagram^illustrates shows 
one possible Terpsichore memory environment: ly " 



~.jct Terpsichore to Mnemosyne 
be used to connect Mnemosyne, 

bore provides three different mappings of the local memory environment 
s : local physical address space. The non-interleaved space provides for the 
access of all Mnemosyne, Calliope, and Hydra device memory spaces such that 
each device appears as a single continuous space. The uniprocessor spaces 
provide for the interleaved access of one, two, or four sets of eight Mnemosyne 
devices on separate Hermes channels as a single continuous space. The 
multiprocessor spaces provide for the interleaved access of one, two, or four sets 
of nine Mnemosyne devices on separate Hermes channels as a single continuous 
space with the ninth channel used as a cache coherency directory. 

63 48 47 40 39 0 

| space | SpaceLocalAddress [ 
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The value of the space field determines the interpretation of the 48-bit local 
address field as given by the following table: 


interpretation 


non-interieaved Hermes channel 0..7 space 


non-interleaved Hermes channel 8.. 11 space 


8x1 -way interleaved uniprocessor memory space 


9x1 -way interleaved multiprocessor memory space 


8x2-way interleaved uniprocessor memory space 


9x2-way interleaved multiprocessor memoi 


8x4-way interleaved uniprocessor memojy%s& 
9x4-way interleaved multiprocessor m#^ry%space 




he non- interleaved Hermes channel 0..7 space provides a single continuous 
memory space for each device in Hermes channels 0..7. Mnemosyne protocols are 
used. Only incoherent accesses are supported (no memory directory tags). 
47 4 0 39 37 »*34 3 2 0 

| s=Q | c |m| addr [ b | 
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The range of valid values and the interpretation of the fields is given by the 
following table: 


field 

value 

interpretation 

s 

0 

Specify non-interleaved Herrnes 
channel 0.. 7 space 

c 

0..7 

Hermes channels 0..7 

m 

0..3 

Module address 

addr 

0.. 232-1 

Logical memory block address 

b 

0..7 

Pad for conversion of bytesad&ress to 
block address , \ 

non- 

nterleaved 

space field interpretation 


Non-interleaved Herme s chanoAn..11 ^ ^S_ 


The non-interleaved Hermes cha 

memory space for each device in ^ 

Calliope/Hydra protocols mag^^jf^sp* 
supported (no memory direct%y%|s). ""' 


tJcs a single continuous 
% Either Mnemosyne or 
incoherent accesses are 


M 


3 2 0 

ZED 


The range ofj 


lids is given by the 


field 


interpretation ' 

AY 


SSfaaffffy nor^iriterleaved Hermes 
channel a,n:space • • 


0..1 i,, 

0j| use MftgMosyne protocol 
1 use Cal hope/Hydra protocol 


£M 

Humes 'channels- 8..11 



Module address 

r adcfr 

0.. 232-1 

Logical memory block address 

b 

0..7 

Pad for conversion of byte address to 
block address 
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Uniprocessor Interleaved Spaces 

The interleaved spaces described below interleave between 8 Hermes channels 
(0..7), supporting only incoherent accesses (no memory directory tags). 
Mnemosyne protocols are used. 

47 40 3938 37 6 5 3 2 0 

I s=2 | n | addr I c j b | 

8 2 32 3 3 


47 40 39 38 #%aJ¥'6 5 3 2 0 

1 s=4 H addr H c j b | 

8 1 32 1 3 3 

47 40 39 £ ^^r^ 8 7 65 3 2 0 

I s=6 I ff4 ,if>^H°l"3 

The interleaved spaces describe^ bclo> interleave between 4 Hermes channels 
(0.3 or 4. .7), supporting on{^tf^het^af^ece.^pBl^o memory directory tags). 
Mnemosyne protocols are g&s^^ /*'^% f ^ $\ 

47 40 393837 36 ,vi^ ' ' 5 432 0 

I s=10 |d|n| ~ I cjbl 

8 1 2.. £, , 32 2 3 

47 40 *>™37 6 54 32 0 

' ft- T h?' P 

47 40 a*&3 V ' ^ '** -* 7 6 5 4 3 2 0 

8 1 32 2 2 3 

Th^^#a:^^^^:es deslliSfeil belo^interleave between 2 Hermes channels 
(%,X?,2.3, 4..y;xtf 6..7), supporting only incoherent accesses (no memory directory 
ta $!.<>. Mnemosyne protocols are used. 

47 40 3938 3736 35 433 q 

8 2 2 32 1 3 

47 40 3938 37 36 5 432 0 

| s=20 jdH addr Hcj b | 

8 2 1 32 1 1 3 

47 40 3939 37 6 5 43 2 0 

I s=22 |d| addr |m|cf b | 

8 2 32 2 13 
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The range of valid values and the interpretation of the fields is given by the 
following table: 


field 

value 

interpretation 

s 

2,4,6,10, 
12,14,18, 
20 22 

Specify uniprocessor interleaved 
space 

d 

0..3 

High-order bits of Hermes channel 
number ^ 

c 

0..7 

Low-order bits of Hermesi&hi|^els 
number .*13K' 

n 

0..3 

High-order bits of Hefne\module 
address 

m 

0..3 

Low-or^der bits of Hermes roodule 
addfes^ ^ ^S^*^ 

addr 

0..232-1 

Lexical mefjic^blocKaSotess 

b 

0..7 

^conversion of byte to block ' 



ferleaveW spaces described below interleave between 9 channels for a 
Mprocessor. 

47 40 37 , 


32 


3 5 3 2 0 

~nmn 


40 38 

~F|— 


3 3 
7 65 32 0 


H c I b I 


47 40 39 

I | 


addr 


8 7 65 3 2 0 


M c I b I 
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The range of valid values and the interpretation of the fields is given by the 
following table: 


field 

value 

interpretation 

s 

3, 5,7 

Specify interleaved space 

c 

0..7 

Mnemosyne channels 0..7, before ! 
modification described below 

n 

0..3 

High-order bits of Hermes module 
address ^ 

m 

0..3 

Low-order bits of Hermes^rtwfe 
address \T 

addr 

0..232-1 

Logical memory blo v Q%aB#ess 

b 

0..7 

Pad for conversion^ byte to block 
address ^S*** j.\ 


interleaved spacj 


For the multiprocessor space 
order memory block address b^ 
access to a location in this 
channel specified in the ti 


modified by the low- 
'ing tables. In addition, 
ig, using the Hermes 



I he nemory tag entry is an octlet value for each 64-byte memory block. The 
contents of the tag is interpreted by Terpsichore hardware to signify (0) a zero 
value indicates that the memory block is not contained in the cache, (1) a value 
equal to the virtual address used to access the memory block indicates that the 
value is cached at that address, and (2) any other values indicates that the value 
may be cached in multiple or remote locations and requires software intervention 
for interpretation. 

Thus a read to a memory block accesses the tag, and if the value is zero, fills it with 
the virtual address via which the access occurred. When the memory block is 
returned to memory, the tag is accessed, and if the value is equal to the virtual 
address, the tag is reset to zero. In all other cases, an exception occurs, which is 
handled by software to implement the cache coherency mechanisms. 
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We also need to have a space available in which access to the tag via software 
routines is straightforward - the non-interleaved space makes the tag available, 
but not conveniently. 

SehalBus g sgce 

The Cerberus serial bus space provides access to a memory space in which 
Bootstrap ROM code, Terpsichore, Mnemosyne and Calliope configuration data, 
and other Cerberus peripherals are accessed. The Cerberus seriaj^b.us is specified 
by the document: "Cerberus Serial Bus Architecture." Terpsichore -configuration 
data is accessable via Cerberus as a slave device as well as yi*i tigs^address space. 
47 43 42 27 26 1, ., 

I 3 | net | node I ' 


The range of valid values and thj 
following table: 




3 


ds is given by the 


field lvalue 

interpretation 

3 |3^V^ 

dm^¥ Cerbetus space. 

net k^ifi^ 

^^i%€ert en i net address 

node to 25b 

C«r bonis node address i 

addr f Jtf gj^i ' 

Logical memory block address 

mb*4 

conversion of by|e address to 



3 (4%#lets) 
._.J cache tags (2k-128k x 2 caches) 
vMual cache data (16k-1M x 2 caches) 
Global TLB (4 octlets x 64-256 entries=2k-8k bytes) 
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Events and Threads 

Exceptions signal several kinds of events: (1) events that are indicative of failure of 
the software or hardware, such as arithmetic overflow or parity error, (2) events 
that are hidden from the virtual process model, such as translation buffer misses, 
(3) events that infrequently occur, but may require corrective action, such as 
floating-point underflow. In addition, there are (4) external events that cause 
scheduling of a computational process, such as completion of a disk transfer or 
clock events. 


Each of these types of events require the interruption of the current flow of 
execution, handling of the exception or event, and in sadSl^a»s, descheduling of 
the current task and rescheduling of another. The Te^gs^hore processor provides 
a mechanism that is based on the multi-th^aded e^£ctlj|pn rgdgel of Mach. Mach 
divides the well-known UNIX procesj#M%l m0$$lf^ ^m$^e called a task, 
which encompasses the virtual mem^^spac^H^ ; ^&id^es|iffce state, and the 
other called a thread, which mclud^lhe'progtir^ c ■ua^r/iftack space, and other 
register file state. The sum of a Mi ch. task andli ^M^lferead exactly equals one 
UNIX process, and the Mach,.. r^pel a||d||pi be associated with several 

threads. On one processor^^^^b^^^ment^^rne, a^ffiast one task with one 
thread is running. \ N \ 

Odescnb «* lx ve th < au e ( I the event may either be 
»ead^g|iid|^ny Sypes 1, 2, and 3, or 
r task and mreac%at is not currently 
js? events. Terpsichore will suspend the 
^.and co^ti^l^ execution with another 
y event^For asynchronous events, 
i the dedicated event thread, while not 
nmg tfrr^d. 


In the taxonomy ol evol 
synchronous 1 
asynchronous 
running, genel 
currendy mnni 
thread that i 
Terpsichore 1 
necessarily m 



sufficient' resources ^fefctfie interleaved execution of at least 
ead, con,taining,»64 generaLiTegisters and a program counter, and at 
event thHliC containing 16 general registers and a program 
WherWbth threads are able to continue execution, priority is generally 
to the event threads. 

All facilities of the exception, memory management, and interface systems are 
themselves memory mapped, in order to provide for the manipulation of these 
facilities by high-level language, compiled code. In particular, the thread 
resources of the full threads are memory -mapped so that the exception threads 
are able to read and write the general registers and program counter of the full 
threads. 
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Events are single-bit messages used to communicate the occurrence of exceptions 
between full threads and event threads and interface devices. 


J 


The event register appears at several locations in memory, with slightly different 
side effects on read and write operations. % 



prj$f&& Causes Terpsichore to issue a read 
? sic^^dlress. The device referenced by this 
'e time%ith a value, which is inclusive-or'ed into 

following notes list the resources needed to support the threads... 
Events: 

full thread 0 suspended at instruction fetch because of exception 
full thread 0 suspended at data fetch because of exception 
full thread 0 suspended at execution because of exception 
full thread 0 suspended at execution because of empty pipeline 
same for full threads 1-3. 
timer, calliope, and hydra events 


0 
1 

2 
3 

4-15 
16-63 
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Thread resources: 

General registers at data fetch stage 
General registers at execution state 
Program counter, privilege level at data fetch stage 
Program counter, privilege level at execution stage 
Mask register: events which permit the thread to run 
Mask register: events which prevent the thread to run? 
Control register: suspend ifetch, dfetch, execute, thread priority 
Local TLB entries {full threads only) 
Exception information registers: 
exception cause 

instruction which caused exception 
virtual address at which access attempted 
size of access attempted & 
type of access (read, write, execute, ga& 
did exception occur at lower o* higher addf f Cros#¥oundary? 
don't need - register contents (ca.Q'" 


Sort by stage: 

Inst fetch stage: 

program countei 
exception stj ' ~ 
control rei " ! 



tiin required- 


local TLB hit indication), inst 


Incl a%ess type, size, boundary, 


Data fetch stai. 

Genera!^ 
progrjirm^ht^ 
control i eg i sir 

>~ suspend (dral 
%!e* 
% proceeJ f 
exception state- ca 

can compute local va from GR, inst: 
^ shift-and-add-load-shiftl-shiftr-add (7) 

computing global va is hard...=> need global va register 
can compute size from inst, 16-word table: shift and add load (4 ) 
prefetched data, instruction queue 
clear queue 
drain queue 

Execute stage: 

General registers 

program counter, privilege level 

control register: suspend MU 0 Q23372 


exception state: cause(flt/fix arithmetic), inst 


Exceptions: 
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number 

exception 

0 

Access disallowed by tag 

1 

Access detail required by tag 

2 

Cache coherence action required by tag 

3 

Access disallowed by virtual address 

4 

Access disallowed by global TLB 

5 

Access detail required by global TLB 

6 

Cache coherence action required by qlobaLTLB 

7 

Global TLB miss ^ 

8 

Access disallowed by local TLB ^ 

g 

Access detail required by local TLB^ 

10 

Cache coherence action required by> I6%ai TLB 

11 

Local TLB miss * . * 

12 

Floating-point arithmetic 

13 

Fixed-point arithmetic \ r 

14 

Reserved instruction \ ^# * 

15 - 



Parameter passing . 

There are no special registers to ixicfi< 
virtual address al 
point operatioi 
via memory- 


When a synchrony 
state is frozen^Aol^eni 
exeception , j$ wl^tevjf ^ 
by writmgjbstr|fe full 




ie exception, such as the 
te operands of a floating- 
is^inJ%mation is available 


threa^the corresponding thread's 
:alled. event thread should handle the 
red, al^ly-hen may restart the full thread 

•■mm 


:o"the macl 


jSccnrsUn an event thread, an immediate transfer 
Tcheck vector address, with information about the 
on avafl^Me in the machine check cause field of the status register. The 
^ sfer of control may overwrite state that may be necessary to recover from the 
teption; the intent is to provide a satisfactory post-mortem indication of the 
characteristics of the failure. * -.? 

Exceptions in detail 

This section is under construction. Terpsichore has changed from passing the 
parameters in registers to passing the parameters in memory-mapped registers, 
and the information in this section doesn't reflect the changes yet. 

This section describes in detail the conditions under which exception occurs, the 
parameters passed to the exception handler, and the handling of the result of the 
procedure. r—— 
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Access disallowed by tag 

This exception occurs when a read (load), write (store), execute, or gateway 
attempts to access a virtual address for which the matching virtual cache entry 
does not permit this access. 

Prototype 

int AccessDisallowedByTag(int address, int size, int access) ^ 

Description 

The address at which the access was attempted is j 
the access in bytes is passed as size. The tvjpe of a 
meaning read, 1 meaning write, 2 mes 
exception handler should determine ajgessiSj 
if desired, and return if the access^^p^d be^ 
restarted and the access will be i 

Access detail regain 




address. The size of 
as access, with 0 
gateway. The 
memory state 
, execution is 


ixecute attempts to 
te entry would permit 


This exception occurs, 
access a virtual addref 
this access, but the del: 

Prototype 

int AccessDetai 


Description*.^ , 

The addrv^o at which the iccess was attempte^Hs passed as address. The size of 
the acgefl^^oyt^fslpassedlSI 5j^^^^^^e>of access is passed as access, with 0 
meaning rfcad^ \ meaning write, -2 meaning execute, and 3 meaning gateway. The 
oh* handler should determine accessibility and return if the access should 
wed. Upon return, execution is restarted and the access will be retried. If 
fletail bit is set in the matching virtual cache entry, access will be permitted. 

Cache coherence action required bv tag 

This exception occurs when a read (load, execute, or gateway), write (store), or 
replacement attempts to access a virtual address for which the coherence state of 
the matching virtual cache entry cannot permit this access. 

Prototype 

int CacheCoherence InterventionRequired(int address, int size, int access) 
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The address at which the access was attempted is passed as address. The size of 
the access in bytes is passed as size. The type of access is passed as access, with 0 
meaning read, 1 meaning write, 2 meaning replacement. The exception handler 
should modify the cache status to make the cache line accessible. Upon return, 
execution is restarted and the access will be retried. 

Access disallowed bv global TLB 

This exception occurs when a read (load), write (store^ 
attempts to access a virtual address for which the mate 
not permit this access. 

Prototype ' 

int AccessDisallowedByGlobalTLBl 

Description 

it which the. 



The address i 
the access in 1 
meaning read, 
exception handler 
if desired, and 
restarted and, 
global TLB 


address. The size of 
:d as access, with 0 
leaning gateway. The 
the virtual memory state 
p^ return, execution is 
:t in the matching 


(store), execute, or gateway 
the matching global TLB entry 
1 TLB entry is set. 


^ccessDetaiIRequiredByGlobalTLB(int address, int size, int access) 
Description 

The address at which the access was attempted is passed as address. The size of 
the access in bytes is passed as size. The type of access is passed as access, with 0 
meaning read, 1 meaning write, 2 meaning execute, and 3 meaning gateway. The 
exception handler should determine accessibility and return if the access should 
be allowed. Upon return, execution ^restarted and the access will be forced to be 
permitted. If the access is not to be allowed, the handler should not return. 
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Cache coherence action required bv global TLB 

This exception occurs when a read (load, execute, or gateway), write (store), or 
replacement attempts to access a virtual address for which the coherence state of 
the matching global TLB entry cannot permit this access. 

Protot y pe 

int CacheCoherence InterventionRequired(int address, int size, i^f access) 
Description 


The address at which the access was attempted is pass 
the access in bytes is passed as size. The type of acce* 
meaning read, 1 meaning write, 2 mean^|| replay *" 
should modify the virtual memory stata" 
return, execution is restarted and th^tc^ 

Global TLB miss 


This exception occurs wj 
attempts to access 



address. The size of 
:d as access, with 0 
;ception handler 
accessible. Upon 


stori),^xecute, or gateway 
[obal TLB %itry matches. 


is passed as address. The size 
Iccess is passed as access, with 0 
ite, and 3 meaning gateway. The 
entry which defines the translation 
irn, execution is restarted and the global 


meaning 
excepi ' 
andpfotj 
TLB access 

Ac cess disallowed bv local TLB 

This exception occurs when a read (load), write (store), execute, or gateway 
attempts to access a virtual address for which the matching local TLB entry does 
not permit this access. 

Prototype 

int AccessDisallowedByLocalTLB(int address, int size, int access) ^ 0023376 

Description 

The address at which the access was attempted is passed as address. The size of 
the access in bytes is passed as size. The type of access is passed as access, with 0 
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meaning read, 1 meaning write, 2 meaning execute, and 3 meaning gateway. The 
exception handler should determine accessibility, modify the virtual memory state 
if desired, and return if the' access should be allowed. l Upon return, execution is 
restarted and the access will be retried. 

Access detail required bv local TLB 

This exception occurs when a read (load), write (store), execute, or gateway 
attempts to access a virtual address for which the matching local 4LB entry would 
permit this access, but the detail bit in the local TLB entry is s^J^% # 

Prototype 

int AccessDetaitRequiredByLocalTLB(int address, 

Description 

The address at which 
the access in bytes 
meaning read, 1 meaning 
exception handler should, 
be allowed. Upon retui 
permitted. If the ac< 



iccess) 


: asf address. The size of 
passed as access, with 0 
leaning gateway. The' 
if the access should 
ressfwill be forced to be 


lyifuld not return. 


Jay), write (store), or 
the coherence state of 


address, int size, int access) 


^address at which the access was attempted is passed as address. The size of 
the access in bytes is passed as size. The type of access is passed as access, with 0 
meaning read, 1 meaning write, 2 meaning replacement. The exception handler 
should modify the virtual memory state to make the local TLB accessible. Upon 
return, execution is restarted and the access will be retried. r 

Local Tl R miss 


MU 0023377 


This exception occurs when a read (load), write (store), execute, or gateway 
attempts to access a virtual address for which no local TLB entry matches. 

Prototype . 


void LocalTLBMiss(int address, int size, int access) ; Highly Confidential 
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Description 

The address at which the local TLB miss occurred is passed as address. The size of 
the access in bytes is passed as size. The type of access is passed as access, with 0 
meaning read, 1 meaning write, 2 meaning execute, and 3 meaning gateway. The 
exception handler should load a local TLB entry which defines the translation and 
protection for this address. Upon return, execution is restarted and the local TLB 
access will be attempted again. 

Floating-point arithmetic 

Prototype / V 

%. %<, 

quad FloatingPointArithmetic(int inst, quad quad^: 
Description 

The contents of the instruction w^^^i 
inst, and the contents of regisfe^^^ri 

exception handler should a1$empt v td| r - - 

instruction and service at^^^^ti^^©n^^n^ tnaj ^%r. The result of the 
function is placed into legistex ic or td upon leiun. s ; v t> 


exception is i 
I as ra, rb and rc. The 
funj|don specified in the 


Fixed- point arit hmetic . 


Prototype . - 

mt FixedPointArithmmc(inl inst, int ra, int *b} 


Description 


The contents of the inbructipn ^hicjjp was t|l\cause of the exception is passed as 
inst, and the contents of registers ra and^r&iare passed as ra and rb. The exception 
handler should attempt to perform tne^function specified in the instruction and 
service any eveeptiouJ condition? that occur. The result of the function is placed 
into register rb Or rc. 

Reserved Instruction 
Protot y pe-- 

int ReservedInstruction(int inst, int ra, int rb) 


MU 0023378 


Description: 

The contents of the instruction which was the cause of the exception is passed as 
inst, and the contents of registers ra and rb are passed as ra and rb. The result of 
the function is placed into register rd. 
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Access Disallowed bv virtual address 

This exception occurs when a load, store, branch, or gateway refers to an aligned 
memory operand with an improperly aligned address. 

Prototype: 

int AccessDisallowedByVirtualAddress(int inst, bit address) 

Description: , 


The contents of the instruction which was the cause of t,. 
inst, and the address at which the access was attempted if* 

Clock 

Each Euterpe processor includes, 
accuracy. The value of the clo< 
regardless of the number of inf 
register is 64-bits long. 

For testing purposes t] 
in normal operatioi 
no mechanism provide* 
the possibility of 
63 


exception is passed as 
tesea as address. 


irocessor-clock-cycle 
lented on every cycle, 
St cycle. The clock cycle 


id writable, though 
falization time; there is 
k cycle counter without 



bck cycle register is equal to the value 
i setslhe specified clock event bit in the event 


testing purposes the clock match register is both readable and writable, 
though in normal operation it is normally written to. 


63 0 

| clock match 


64 


63 


6 5 0 


0 

Iclockj 
[eventj 
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Watchdog Timer 

A Machine Check is asserted when the value in the clock cycle register is equal to 
the value in the watchdog timer register. 

The watchdog timer register is both readable and writable, though in normal 
operation it is usually and periodically written with a sufficiendy large value that 
the register does not equal the value in the clock cycle register before the next 
time it is written. 


Tally Counter 



Each Euterpe processor includes 
events or operations. The valui 
each processor clock cycle in 
counter registers do not s: 

. It is required that a suifkieftt number o< bits b< 
counter registers overflow no more frequently thf 
sufficient for a 4v<%l£ clock I he remaining i 
whenever read, smd i^nore< 



For testing pur^os 
writable, thougb^to^rm 
initialization tinae^ there is no m< cha« 
event counter registers without \ 

63 - ; 


ly processor-related 
lers are incremented on 
opejations occur. The tally 


jnted so that the tally 
% per second. 32 bits is 
:n|ed bits must be zero 


iter re^&ttg^are both readable and 
ihould%be' written only at system 
provio\d for adjusting the value in the 
ulity <#png counts. 

0 


tally counter 1 


The tally counter control register selects one event for each of the event counters 
to tally. 

63 32 31 16 15 0 

I O ftally control Oft ally control 1| 

32 16 16 
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The valid values for the tally, control fields are given by the following table: 


value 

interpretation 

0..63 

tally events 0".63 

64 

freeze counter: count nothing 

65 

tally instructions processed by address unit 

66 

tally instructions processed by execute unit 

67 

tally instruction cache misses 

68 

tally data cache misses % 

69 

tally data cache references 

70.. 
65535 

Reserved J\$F 


Control Register Addressed .f^^ 


This section is under constracti^gsJlp 
infer anything 1 
tentative 


1 hardware designers should not 
or entries in the 


thing about the vrJat of the addresses hem the ordering 
table below: ^ 8 


4 
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Reset and Error Recovery 

Certain external and internal events cause the Euterpe processor to invoke reset 
or error recovery operations. These operations consist of a full or partial reset of 
critical machine state, including initialization of the event thread to begin fetching 
instructions from the start vector address. Software may determine the nature of 
the reset or error by reading the value of the Cerberus control register, in which 
finding the reset bit set (1) indicates that a reset has occurred, finding the clear bit 
set (1) and the reset bit cleared (0) indicates that a logic clear r% occurred, and 
finding both the reset and clear bits cleared (0) indicates that I^Jmne check has 
occurred. When either a reset or machine check has been^^lfled, the contents 
of the Cerberus status register contains more detailed information on the cause. 


A reset may be caused by a Cerbi 
register which sets the reset bit, i * 
detection, and double machine 

A reset causes the Euterpi 
and low clock speed 
stabilize the phase I 
addresses to equal 
execution at 




tata^^e Cerberus control 
- 5 ef|prs including meltdown 


5% to minimum power 
"]ert^erus status register, 
-Inslate all local virtual 
le event thread to begin 


Other system 
software; this 
superspring stai 
interface devi< 
these rems 
of standai 


yy~ explicitly initialized by 
udes^th^^mi^ thr^dl&lte, global TLB state, 
f #h ; annel .^te^icesT Mnemosyne - memory and Cerberus 
lector adjfeess is responsible for initializing 
: bootstrap code from a series 


t occurs upon initial power-on. The cause of the reset is noted by initializing 
lerberus status register and other registers to the reset values noted below. 

Cerberus -grounded Revet 

A reset occurs upon observing that the Cerberus SD data signal has been at a 
logic low level for at least 33 cycles of the Cerberus SC clock signal. The cause of 
the reset is noted by initializing the Cerberus status register and other registers to 
the reset values noted below. < 


Cerberus Control Register Reset 

A reset occurs upon writing a one to the reset bit of the Cerberus control register. 
The cause of the reset is noted by initializing the Cerberus status register and 
other registers to the reset values noted below. 
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Meltdown Detected Reset 

A reset occurs if the temperature is above the threshold set by the meltdown 
margin field of the Cerberus configuration register. The cause of the reset is noted 
by setting the meltdown detected bit of the Cerberus status register. 

Double Machine Check Reset 

A reset occurs if a second machine check occurs that prevents recovery from the 
first machine check. Specifically, the occurrence of an excepti|^^^vent thread, 
watchdog timer error, or Cerberus transaction error wl^e %f%iachine check 
cause bit is still set in the Cerberus status register resu% JnVt double machine 
check reset. The cause of the reset is noted by setting double machine check 
bit of the Cerberus status register. 


Clear 

Writing a one to the 
A logic clear causes 
and swing levels written 
registers, configuration, 
stabilize the phase 1< 
addresses to equal 
execution at the 
the clear bit of 
clear. <^ 

Machine 


Detected hai 
channels 




ier invokes a logic clear, 
iration to the power 
s power and swing 
"juration registers, 
ite all local virtual 
le event thread to begin 
r clear is noted by leaving 
te (^^yhe end of the logic 

# 1# 


lis errors in one of the Hermes 
sout error, or internal cache parity 
check will set the local TLB to 

„ „ .^fal physical addresses, note the cause of 

exception Jin' the CerberuT status register, and transfer control of the event 
I to the start vector address. This action is similar to that of a reset, but 
ffers in that the configuration settings, main thread state, and Cerberus and 
Mnemosyne state are preserved. 

Recovery from machine checks depends on the severity of the error and the 
potential loss of information as a direct cause of the error. The start vector address 
is designed to reach instruction memory accessed via Cerberus, so that operation 
of machine check diagnostic and recovery code need not depend on proper 
operation or contents of any Hermes channel device. The program counter and 
register file state of the event thread prior to the machine check is lost (except for 
the portion of the program counter saved in the Cerberus status register), so 
diagnostic and recovery code must not assume that the register file state is 
indicative of the prior operating state of the event thread. The state of the main 
thread is frozen similarly to that of a main thread exception. 

MU 0023384 
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Machine check diagnostic code determines the cause of the machine check from 
the processor's Cerberus status register, and as required, the Cerberus status and 
other registers of devices connected to the ByteChannels. Any outstanding 
memory transactions may be recovered by a combination of software to re-issue 
outstanding writes, and by aborting and restarting the main thread execution 
pipeline to purge outstanding reads. 


Because Cerberus operates much more slowly than the peak 
processor under normal operation, machine check diagnostic i 
will generally consume enough time that real-time interface j 
may have been missed. Consequently, the machine check ^ec. 
need to repair further damage, such as interface buffer under 
as may have occurred during the intervening time. ^ 1 \ 


This final recovery code, which re-mitia%|b the s 
recovers a functional event thread .^SiS^ 
machine resources, as the conditiqfi|wiiph < 
been resolved. 


I of the Euterpe 
1 recovery code 
pflance targets 
' software may 
; and overruns 


iace system and 
g the complete 
check will have 



k P arjf %SP uncorrec tabi€ error odfcurs in a Euterpe or Mnemosyne cache, 
error^s generally non-recoverable. These errors are non-recoverable 
^ use the data in such caches may reside anywhere in memory, and because 
the data in such caches may be the only up-to-date copy of that memory contents. 
Consequently, the entire contents of the memory store is lost, and the severity of 
the error is high enough to consider such a condition to be a system failure. , 

The machine check provides an opportunity to report such an error before 
shutting down a system for repairs. 

There are specific means by which a system may recover from such an error 
without failure, such as by restarting from a system-level checkpoint, from which a 
consistent memory state can be recovered. 
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Pari ty or Uncorrectable Error in Memory 

When a parity of uncorrectable error occurs in Mnemosyne or Calliope memory, 
such an error may be partially recoverable. The contents of the affected area of 
memory is lost, and consequently the tasks associated with that memory must 
generally be aborted, or resumed from a task-level checkpoint. If the contents of 
the affected memory can be recovered from mass storage, a complete recovery is 
possible. 

If the affected memory is that of a critical part of the opera|^nj^yf tem, such a 
condition is considered a system failure, unless recovery^aiyle' accomplished 
from a system-level checkpoint. j ^ 

Communications Error in Hermes Chai 

A communications error in Hermg^Eall 
command error, or timeout error, isj^e*allj 

Bits corresponding to the affe 
Cerberus status register. Rf^ 
affected, by querying the * 1 
Hermes channels. 



.eck byte error, 


; afl set in the processor's 
te which devices are 
ivice on the affected 


it the time of a machine 
% _ Jerries channel(s) , these 
iiejgfle«tch through the 
* and re-issue them as 
state. 


t, such as a Cerberus transaction 
transaction error (due to timeout) 
;uration operations to determine the 


Read and write tr&tib.w 
check. Because the maohui< 
transactions will not b< 
memory int< 
stores, then must 

Commu 

A commu^S>| 
erroi «> generally 
ma> result roiri normal 
existence of optional devices in the system. 

W dtcho'cg Timeout Error 

A watchdog timeout error indicates a general software or hardware failure. Such 
an error is generally treated as non-recoverable and fatal. 

Event Thread Exception 

When an event thread suffers an exception, the cause of the exception and a 
portion of the virtual address at which the exception occurred are noted in the 
Cerberus status register. Because under normal circumstances, the event thread 
should be designed not to encounter exceptions, such exceptions are treated as 
non-recoverable, fatal errors. 
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Start Vector Address 

The start vector address is used to initialize the event thread with a program 
counter upon a reset, clear or machine check. These causes of such initialization 
can be differentiated by the contents of the Cerberus status register. 

The start vector address is a virtual address which, when "translated" by the local 
TLB to a physical address, is designed to access node number zero on the local 
Cerberus network, which will ordinarity contain an interface ^ the bootstrap 
ROM code. The Cerberus/Bootstrap ROM space is choselihto" minimize the 
number of internal Terpsichore resources and Terpsichoj^^rFaces that must 
be operated to begin execution or recover from a machi 


virtual address 

*$£dfeiptjon 

0x0003 0000 0000 0000 

address 


Bootstrap Code 

Bootstrap code requirem< 
Architecture, but remains 

The basic requir< 
initialization of ~ 
control registers 
further bootsja 
based ordering 
storage devices, 
interfaces, thei 



Terpsichore System 
document. 

include power-on 
^ levices, using Cerberus 
Tbf a%|interface from which 
Id b^^fhned in a priority- 
to rgnapv^^e5%eplaceable read-only 
rea#Mffif devices, then network 
vices. ^ % 
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Cerberu s-4teglsje% are internal read/onl? and read/write registers which provide 
an ^apW Htriiioa-mdept^deht mechanism to query and control the 
^<^^jf ation %4$ievices in a Terpsichore system. By the use of these registers, a 
«p-of a Terpsichore system may tailor the use of the facilities in a general- 
purpose implementation for maximum performance and utility. Conversely, a 
supplier of a Terpsichore system component may modify facilities in the device 
without compromising compatibility with earlier implementations. These registers 
are accessed via the Cerberus serial bus. 

As a device component of a Terpsichore system, each Euterpe processor contains 
a set of Cerberus-accessable configuration registers. Additional sets of 
configuration registers are present for each additional device in a Euterpe system, 
including Mnemosyne Memory devices, and Calliope interface devices. 

Read/only registers supply information about the Terpsichore system 
implementation in a standard, implementation-independent fashion. Terpsichore 
software may take advantage of this information, either to verify that a compatible 
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implementation of Mnemosyne is installed, or to tailor the use of the part to 
conform to the characteristics of the implementation. 

The read/only registers occupy addresses 0..5. An attempt to write these registers 
may cause a normal or an error response. 

Read/write registers select operating modes and select power and voltage levels 
for gates and signals. The read/write registers occupy addresses 6..9 and 25..43. 

Reserved registers in the range 10.. 24 and 44.. 63 must app( 
registers with a zero value. An attempt to write these regisl 
or an error response. 


Reserved registers in the range 64..2 1 6-1 may be im 
registers with a zero value, or as address<|sj|yhich 
or writes are attempted. % ^ 


The format of the registers is 
Cerberus address of the registei 
The value indicated is the hai ' 
and is the value 
register. If a reset 
required by this i 
range is the set 
interpretation " 
more comprej 
octlet bits 
0 63.. 16 



:ither as read/only 
Response if reads 


tow. The octlet is the 
the field in a register, 
a read/only register, 
:set for a read/write 
initialization is not 
:o the value field. The 
egister may be set. The 
f the register field; a 


implementor 
code 

0x00 
40 
a3 
d2 
□6 
7f 


Identifies Euterpe processor device 
as implemented by Microlinity. 

implementor 
revision 

0x01 
00 


Implementation version 1.0. 
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octlet bits 
2 63..1f 


octlet bits 
3 63..16 


octlet bits 
4 63.. 60 
59..56 
55..0 


field name 

value ranqe 

interpretation 

manufacturer 
code 

0x00 

40 

a3 

69 

db 

3f 


Identifies initial manufacturer of 
Euterpe processor device 
implemented by MicroUnity 

manufacturer 

r*»\f icinn 

■ t v i jiun 

0x01 
00 


Manufacturing version 1.0. 

field name 

value ranqe 

intej^re^tei 

serial 
number 

0 


Fhis device haCndtsirial number 
capability... % > 

dynamic 
address 

0 


this device m ■ ; dynamic 

aMdress^^iapaBtlity? 
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fieid name value range 


reset 
clear 


selftest 


defer writes 


0..1 


0..1 


0..1 


set to invoke device's circuit reset 


set to invoke device's logic clear 


set to invoke device's selftest: bits 
60.. 48 may indicate depth of seiftest 


set to cause writes to octlets 25.. 43 
to be deferred until the next logic- 
clear or non-deferred write. 


; fcannel, set to 
) be ignored 
ftted. Upon 
s input channel 
int is reset, and after 
iy, the input and 
Hannel links are 



M-termes 


ffiSh cidle 0 and cidle 1 
j,. place of normal 
(0, 255), and from which 
~ i sampled. 


fed on idle Hermes 
pel when output clock 


hsmitted on idle Hermes 
f channel when output clock 
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field name 


reset/clear/ 
selfftest 
complete 


reset/clear/ 
selftest 
status 


meltdown 
detected 


machine 
check detail 


machine 

check 
program 
counter 


1 


range 


0..1 


This bit is set when a reset, clear or 
selftest operation has been 
completed. 


This bit is set when a reset, clear or 
selftest operation has been 
completed successfully, 


This bit is set when the meltdown 
detector has caused teset. 


interpretation 


p| bit-^ s|fWhen a watchdog 
neout has caused a machine 



to indicate exception code if 
Exception in event thread. Set to 
bitmap of which Hermes channels if 
Hermes channel error. 


Set to indicate bits 31.. 16 of the 
>/alue of the event thread program 
counter at the initiation of a machine 
;heck. 


.25 value sampled on specified Hermes 
channel when input clock is zero (0). 
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octiet bits 
8 63..0 


octiet bits 
63..0 


raw 1 

* 

0..25 
5 

Value sampled on specified Hermes 
channel immediately following 
sample value in raw 0 register. 

field name 

value 

ranqe 

interpretation 

indirect 
address 

0* 

1.26 

4-1 

Write to this register to set physical 
address used for reads and writes to 
indirect data register. 

field name 

value 

value 

interprei$atiol/ ? 



Am, - 
- J>%^ 
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octlet bits field name value range 


25 63.. 56 
55..48 
47.. 40 
39.. 32 
31. .24 
23.. 16 


Unassigned 
Custom knob 
CI Tag knob 


Unassigned 
Custom knob 


CD Tag knob 


TLB knob 


Branch 
Target Cache 
knob 


121 


1..12 
7 


Knob settings for Unassigned custom 
circuits. 


.12 Knob settings for Unassigned custom 
circuits. 


12 Knob settings for CI Tag circuits. 


12 Knob settings for CD Tag circuits. 


interpretation^ 


.12 Knob settings for. 


.12 Knob settingCfd^efanch Target 
ache circ • 



octlet bits field 
26 63.. 56 



27 63. .56 
55.. 48 
47.. 40 
39.. 32 


spar 5,6 
knob 

121 

1..12 
7 

Knob settings for SOFA region spar 
5,6. 

spar 5,6 
knob 

121 

1..12 
7 

Knob settings for SOFA region spar 
5,6. 

spar 5,6 
knob 

121 

1..12 
7 

Knob settings for SOFA region spar 
5,6. 

spar 5,6 
knob 

121 

1..12 
7 

Knob settings for SOFA region spar 
5,6. 
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31. .24 
23.. 16 
15,. 8 
7..0 


spar 3,4 

121 

1..12 
7 

Knob settings for SOFA region spar 
3 4 

spar 3,4 
knob 

121 

1..12 
7 

Knob settings for SOFA region spar 
3,4. 

spar 3,4 
knob 

121 

1..12 
7 

Knob settings for SOFA region spar 
3,4. 

spar 3,4 
knob 

121 

1..12 
7 

Knob settings for SOFA region spar 
3,4. 



15.. 8 
7..0 





K^^^ftings for SOFA region spar 


121 

1. 12 

Knob settings for SOFA region spar 
13,14. 

%spar 11,12 
knob 

121 

1..12 
7 

Knob settings for SOFA region spar 
11,12. 

spar 1 1 ,1 2 
knob 

121 

1..12 
7 

Knob settings for SOFA region spar 
11,12. 

spar 11,12 
knob 

121 

1..12 
7 

Knob settings for SOFA region spar 
11,12. 

spar 11,12 
knob 

121 

1..12 
7 

Knob settings for SOFA region spar 
11,12. 

spar 9,10 
knob 

121 

1..12 
7 

<nob settings for SOFA region spar 
9,10. 

spar 9,1 0 
knob 

121 

1..12 
7 

Knob settings for SOFA region spar 
9,10. 


octlet bits field name value range 


interpretation 
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5S..48 

47.. 40 
39..32 
31.. 24 
23.. 16 

15..8 

7..0 


Hermes 
channel 
knob 

121 

1..12 
7 

knob settings for Hermes channel 
circuits. 

Westside 
Repeaters 
knob 

121 

1..12 
7 

Knob settings for Westside Repeater 
circuits. 

D Cache 
knob 

121 

1..12 
7 

Knob settings for Data Cache 
circuits. 

Spring knob 

121 

1..12 
7 

Knob settings for Sprig circuits. 

Unassigned 

CiitttAm IrnAh 

121 

1..12 

Knob settings jor Unassigned custom 
circuits. %i 

Unassigned 
Custom knob 

121 

1.12 

Knob settj.rte%r Unassigned custom 
Sir UM 

spar 13,14 
knob 

121 


Kfo|settm; s for SOFA region spar 

spar 13,14 
knob * 


i. 12 

^^^^^pr SOFA region spar 
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octiet bits 
31 63 
62.. 58 

57..S6 

55 
54.-52 


51 
50..48 


47.. 43 
42 


39-35 
34; 


32 

31.. 24 
23.-22 
21 

20 


0 

0 

D 

Reserved 

resistor fine 
tuning 

20* 

0..31 

^Pt tn finp ti mp rpcicstrir tprminatinn 

yalue 

swing fine 
tuning 

1* 

0..3 

Set to fine-tune voltage swing and 
reference level knob settings. 

0 

Q 

o 


process 

control 

5 

4..6 

Set based on value rejad from PMOS 

Hri\/o Q+ronnth 1100H t/%ifirw&- , hirio 

resistor values in^k^^iittings. 

0 

3 

0 

Reserved j£ V* 

PMOS drive 
strength 


0..7 

This read/onl^ fretjd indicates the 
d/ive stre|j^% PJrfOS devices 
(|xpress#4w«t d^ttls-binary value. 

PLL1 divide 
ratio 

8* 



PLL1 

feedback % 
bypass 



Set to invoke PEL1 feedback bypass. 

PLL1 range. 

o\. 

0 1 

let . \ >pe ation high frequency 
(W6ve 0,|^i£||£); cleared for 
SpeYatioif at| ffwlrrequency (below 
0.yyy GHz) , 


0 j?l 

Q I 

bet to invoke PL tu ana PLlI 
&sca1ir v by#als|- otherwise divide 
Hiput clock by 10 

pLO^iiyMe, 
: \„ 'ratio 


3,23 

PLC0 ,^ividif ratio 

^feedback 

c 

■ 

SeyS^voke PLLO feedback bypass. 

PLLO ranged 



Set for operation at high frequency 
[above O.xxx GHz); cleared for 
operation at low frequency (below 

-*.yyy vjr i£.y. 

conversion 
prescaler 

... b Y-E225 

0 

0..1 

Set to invoke temperature conversion 
arescaler bypass, otherwise divide 

nni it plonk h\/ 10 

analog 
measurement 

0 

0..25 
5 

Set to measure analog levels at 
various test points within device. 

meltdown 
threshold 

0 

0..3 

Set to perform margin testing of the 
meltdown detector. 

conversion 
start 

0* 

0..1 

Setting this bit causes the 
conversion to begin. The bit remains 
set until conversion is complete 

0 

0 

0 

Reserved, (selection extension) 
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15.. 10 
9..0 


conversion 
selection 

0* 

0..9 

Field selects which of ten j 
measurements are taken 

j o 

3 

0 

Reserved, (counter extension) 

conversion 
counter 

0* 

0..10 
23 

This field is set to the two's 
complement of the downslope count. 
The counter counts upward to zero, 
and then continues counting on the 
upslope until conversion completes. 



Reserved for use with additional 
Hermes channel interfaces 


octlet 

64.. 
65536 


bits field name value range 


interpretation 


configuration memory space 


Reserved for use with later revisions 
of the architecture. 
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Identificatio n Registers 

The identification registers in octlets 0.3 comply with the requirements of the 
Cerberus architecture. 

MicroUnity's company identifier is: 0000 0000 0000 0010 1100 0101. 
MicroUnity's architecture code for Euterpe is specified by the following table: 


Internal code name 

Code number 

Euterpe 

0x00 40 a3 24 69M*^ 


Euterpe architecture revisions are specified by the fo%)^^ig%able: 



le following table: 


tes implementation codes 


'nity, uses manufacturer codes 


ficroUnity's Euterpe, as implemented by MicroUnity, and manufactured by 
MicroUnity, uses manufacturer revisions as specified by the following table: 


Internal code name 

Code number 

1.0 

0x01 00 
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Architecture Description Registers 

The architecture description registers in octlets 4 and 5 comply with the Cerberus 
specification and contain a machine-readable version of the architecture 
parameters: A and W described in this document. 
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These registers are still under construction and will contain non-zero values in a 
later revision of this document. 

Parameters will describe number of Hermes ports, size of internal caches, integrat 
ion of Call iope and Mnemosyne functions. 

Control ffeq fcter 

The control register is a 64-bit register with both read and w&ite access. It is 
altered only by Cerberus accesses; Euterpe does not alter thj,..vah*e ! s written to 
this register. 

The reset bit of the control register complies with the <||£ws specification and 
provides the ability to reset an individual Euterpe de€^l& a ^Terpsichore system 
Writing a one (1) to this bit is equivalent to a pWeWn^eset or a broadcast 
Cerberus reset (low level on SD fot 33 cycles) and resets configuration registers to 
their power-on values, which is an operating st&e that eoimuncs minimal current, 
and also causes all internal high-baW^dth4o|ic to be reset. The duration of the 
reset is sufficient for the opcrar-r tF state changes to have taken effect. At the 
completion of the reset operad^Me r%^lea#^||4t c&$iplete bit of the status 
register is set, the reset/ck>*r/ S elftest status bit of the status register is set, and the 
reset bit of the control jT * 


The clear bit of 
provides the 
Writing a on< 
is required 
sufficient for any 
of the reset oj 
set, the resei 
of the com 


the control register complies with the Cerberus specification and 
Eutd^e device in a system, 
w « Mt°gi c to be reset, as 
m swing lev :1s The duration of the reset is 
e ~t< hav< taken effect. At the completion 
efftest complete bit of the status register is 
Ithe stalu^egister is set, and the clear bit 



;e control register complies with the Cerberus specification 
ity tol^gP a se%st on. an individual Euterpe device in a 
Euterpe does not define a selftest mechanism at this time, so 
\ immediately set the reset/clear/selftest complete bit and the 
status bit of the status register. 

The channel under test field of the control register provides a mechanism to test 
and adjust skews on a single Hermes channel at a time. The field is set to the 
channel number for which the cidle 0, cidle 1, raw 0, and raw 1 fields are active. 

The cidle 0 and ddle 1 fields of the control register provide a mechanism to 
repeatedly sent simple patterns on the selected Hermes output channel for 
purposes of testmg and skew adjustment. For normal operation, the cidle 0 field 
must be set to zero (0), and the ddle 1 field must be set to all ones (255) 
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Status Register 

The status register is a 64-bit register with both read and write access, though the 
only legal value which may be written is a zero, to clear the register. The result of 
writing a non-zero value is not specified. 

The reset/clear/selftest complete bit of the status register complies with the 
Cerberus specification and is set upon the completion of a reset, clear or selftest 
operation as described above. 


The reset/clear/selftest status bit of the status register comnj 
specification and is set upon the successful completion < ' 
operation as described above. 

The meltdown detected bit of the sfes 
detector has discovered an on-chip t^perl 
meltdown threshold field of the C 
reset to occur. 


the Cerberus 
clear or selftest 



The double machine check 
check occurs that prev< 
indicative of machine 
of an exception in 
error while any m< 
Hermes error wj " 
the Cerberus 


The other reset' 
other causes of 


lerberus transaction 
Register is still set, or any 
" the%|atus register is set in 
reset. 

:d for the indication of 


Iter is set when an event thread 
check. The exception code is 
te status register. 

neout errof"5it of the status register is set when the watchdog 
1 to the clock cycle register, causing a machine check. 

The Cerberus transaction error bit of the status register is set when a Cerberus 
transaction error {bus timeout, invalid transaction code, invalid address) has 
caused a machine check. Note that Cerberus aborts, including locally detected 
parity errors, should cause bus retries, not a machine check. 

The Hermes check byte error bit of the status register is set when a Hermes 
check byte error has caused a machine check. The bit corresponding to the 
Hermes channel number which has suffered the error is set in the machine check 
detail field of the status register. 

The Hermes command error bit of the status register is set when a Hermes 
command error has caused a machine check. The bit corresponding to the 
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Hermes channel number which has suffered the error is set in the machine check 
detail field of the status register. 

The Hermes timeout error bit of the status register is set when aByteChannel 
timeout error has caused a machine check. The bit corresponding to the Hermes 
channel number which has suffered the error is set in the machine check detail 
field of the status register. 

machine check 
Command or 
/hich machine 
le value indicates 
las been reported. 

.loaded with bits 
recent machine 
lostic capability 
recovery. 

values obtained from 
tel. The raw 0 field 
tnd the raw 1 field 
iple, when the input 
|us register produces two 
reg^|er- read operation on 
itrol of skew in the 


The machine check detail field of the status register is set whe 
has been completed. For a Hermes channel error (check " 
timeout), the value indicates, via a bit-mask, ByteChann* * 
checks have been reported. For an exception in event t" 
the type of exception for which the most recent machin 

The machine check program counter fiel 
31.. 16 of the event thread program e||unill 
check has occurred. The value in thiCfilld pti 



for purposes of software developi 

The raw 0 and raw 1 fields o: 
two adjacent samples of 
contains a value 
contains the 
clock was (1), 
adjacent sampl< 
Cerberus. Th| 
Hermes cham 


jistej|^j|^ontrol the power and voltage 
^^^^ic and memory. The details of 
ire^escra^%jelow. 

^ irately control the power and voltage levels used in a portion of 
jitry. Each such field contains configuration data in the following 


7 6 5 4 3 2 

I o I m l hTTT" 


power and swing controls 
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The range of valid values and the interpretation of the fields is given by the 
following table: 


field 

value 

interpretation 

0 

0 

Reserved 

ref 

0..3 

Set reference voltage level 

lvl 

0..3 

Set voltage swing level. 

res 

0..7 

Set resistor load value. 


The reference voltage level, voltage swing level and resistgr%? 
figures for a full-swing, lowest-power logic gate output^A^ I 
and resistor load values used in various circuits is ^gi^efrli 
values in the tables below. Designed typica% full-s 
res fields are ref=250 millivolts, lvl=500 j * 


The ref field, together with the 
control the reference voltage lfj™. 
domain. Values and interpretadoi^ 
with units in millivolts: 



§ value are model 
tual voltage levels 
ally related to the 
" ir the ref, lvl and 


ration register, 
in the specified knob 
' ty the following table, 


:ld of the configuration register, 
in the specified knob domain, 
e given by the following table, with 



swing fine tuning 

lvl 

0 

1 

2 

3 

0 

275 

300 

325 

350 

1 

375 

400 

425 

450 

2 

475 

500 

525 

550 

3 

575 

600 

650 

700 


Voltage swing level contro 


field interpretation 
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The res field, together with the process control field of the configuration register, 
control the PMOS load resistance value used for logic circuits in the specified 
knob domain. Values and interpretations of the Ivl field are given by the following 
table, with units in kilohms. The table below gives resistance values with nominal 
process parameters. 



process control 

res 

0 


2 


• 4 

■■„,?„ 

6 


0 




Up 

fined 




1 


2.5 

5.0 

7.5 

10. 

13. | 


I 

18. 

2 

in 

1.3 

2.5 

3.8 

5.0 


5^.5 

8.8 

3 


.83 

1.7 

2.5 

3.3 


5 

5.8 

4 


.63 

1.3 

1.9 

2.5 j 


3.8 

4.4 

5 


.50 

1.0 

1-5* 

2.0#, 


» 3 - 

3.5 

6 


.42 

.83 




^2.5 

2.9 

. 7 


.36 

.71 

CI 

141 


2.1 

2.5 


Resistor < 

When the process controkfield.of A^cirigufMSff rqgfc is set equal to the 
PMOS drive strength fiefl of thefco|figu^Qn«eg^£er, %pminal PMOS load 
resistance values are as^vsrn by the following fable, \uth wurs in kilohms. 



f *e% 

PMOS load resistance 


j undefined j 




'"6 3 







^kF ... 

' 2.1 

% 7 



_ Slnemd%«e is reset, a default value of 0 is loaded into each 0 field, 3 in 
^* ef field ' 3 m eacil lvl field and 1 in each res field, which is a byte value of 
1M. The process control field of the configuration register is set to 5, and the 
swing fine tuning field is set to 1. These settings correspond to a chip with nominal 
processing parameters, low power and high voltage swing operation. 

For nominal operating conditions, the ref field is set to 2, the Ivl field is set to 2, and 
the res field is set to 5, which is byte value of 85. The process control field is set 
equal to the PMOS strength field, and the swing fine tuning field is set to 1. 
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Configuration R&gfetp>r 

A Configuration register is provided on the Euterpe processor to control the fine- 
tuning of the Hermes ch annel confi g uration, to control the global process 
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parameter settings, to control the two phase-locked loop frequency generators, 
and to control the temperature sensors and read temperature values. 

The resistor fine tuning field of the configuration register controls the analog bias 
settings for PMOS loads in Hermes channel input and output termination circuits, 
in order to accomodate variations in circuit paramaters due to the manufacturing 
process, and to provide fine-tuning of the input and output impedence levels. 
Under normal operating conditions, four times (4*) the value read from the PMOS 
drive strength field should be written into the resistor fine tuning^eld. In order to 
provide fine-tuning of the input and output impedence 4§y«§/an external 
■ ' ' ' ' ' ' ' 1. AST change of the 


measurement of the impedence or voltage levels is reqi 
resistor fine tuning field causes a proportional change 
impedence levels. The interpretation of the field is giv< 


l^nput and output 
table: 



a small offset in 
ic circuits. The swing 
of the Hermes channel 
; intppretation of the field 
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The process control field of the configuration register controls the analog bias 
settings for PMOS loads in internal logic circuits, in order to accomodate 
variations in circuit parameters due to the manufacturing process. Under normal 
operating conditions, the value read from the PMOS drive strength field should be 
written into the process control field. The interpretation of the field is given by the 
table: 


value 

process control 

0 

Reserved 

1 • 

increase PMOS conductance to 5.00*nomioa{.%^ • 

2 

increase PMOS conductance to 2.50*norain«t| 

3 

increase PMOS conductance to 1 66* mn 

4 

increase PMOS conductance to 1 .25%)orninal 

5 

use PMOS loadsat nominal conduclWe. # 

6 

decrease PMOS conductance to 0.83*nominal 

7 

decrease PMOS cojwfuct mce to 0 7 1 * x>rrurial. 


The PMOS drive strength fiel* 
that indicates the drive str< 
Euterpe chip, expressed as 
power arid voltage lev< 
of individual devices.Jl|^mterpi 


ter is a read/only field 
"•MOS devices on the 
used to calibrate the 
icess characteristics 
>y the table: 



-/are twF identical phase locked-loop (PLL) frequency generators, 
^i|nated PLLO and PLL1. These PLLs generate internal and external clock 
signals of configurable frequency, based upon an input clock reference of either 
50 MHz or 500 MHz. PLLO controls the internal operating frequency of the 
Euterpe processor, while PLLl controls the operating frequency of the Hermes 
channel interfaces. The configuration fields for PLLO and PLLl have identical 
meanings, described below: 

The PLLO divide ratio and PLLl divide ratio fields select the divider ratio for 
each PLL, where legal values are in the range 8..23. These divider ratios permit 
clock signals to be generated in the range from 400 MHz to 1.15 GHz when the 
input clock reference is at 50 MHz, with prescaling bypassed, or at 500 MHz with 
prescaling used. 
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Setting the PLLO feedback bypass bit or the PLL1 feedback bypass bit of the 
configuration register causes the generated clock bypass the PLL oscillator and to 
operate off the input clock directly. Setting these bits causes the frequency 
generated to be the optionally prescaled reference clock. These bits are cleared 
during normal operation, and set by a reset. 

The PLLO range field and the PLLO range field of the configuration register are 
used to select an operating range for the internal PLLs. If the PLL range is set to 
zero, the PLL will operate at a low frequency (below O.xxx GHz )*y£ the PLL range 
is set to one, the PLL will operate at a high frequency (above Ojycl^Hz). At reset 
this bit is cleared, as the input clock frequency is unknown. 

Setting the PLL prescaler bypass bit of the configi 
phase-locked loops PLLO and PLL1 to use the inpuj^l 
clock. This bit is cleared during normal oj||ration^^ |&50< 
which the input clock is divided by lj^andPIs jg^ppmg ( 
50 MHz input clock. At reset thisj4| l%clei " 
unknown. 

Setting the conversion presca 

temperature conversion ui 
Otherwise, clearing 
as a reference clq< 
conversion unit is 
set or cleared, 4$l 
as the input dock treqi 



The meltdown JpdL™ 
meltdown is signa^e^^ 
interpretation^^ ffle fi 
degrees C, md 5 dcg 


register causes the 
:ctly as a reference 
input clock, in 
operation with a 
clock frequency is 


ton register causes the 
is a reference clock. 
:d by 10 before use 
of the temperature 
ation, this bit should be 
jet this bit is cleared, 


te threshold at which 
^wdown prevention logic. The 
>elow with a tolerance of ±6 


% 



ffie|»v#^shold 


J5Jded$#s'C 

1 

10 degrees C 

2 

50 degrees C 

3 

20 degrees C 


The conversion start bit controls the initiation of the conversion of a temperature 
sensor or reference to a digital value. Setting this bit causes the conversion to 
begin, and the bit remains set until conversion is complete, at which time the bit is 
cleared. 

The conversion selection field controls which sensor or reference value is 
converted to a digital value. The interpretation of the field is given by the table 
below: 
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value 

conversion selected 

0 

local temperature sensor 

1 

local temperature reference 

2 

remote 0 temperature sensor 

3 

remote 0 temperature reference 

4 

remote 1 temperature sensor 

5 

remote 1 temperature reference 

6 

remote 2 temperature sensor A 

7 

remote 2 temperature reference ,&> ^ 

8 

remote 3 temperature sensor * 

9 

remote 3 temperature reference , 

10..15 

Reserved % 



MU 0023407 


The conversion counter field is set J 
count. The counter counts upward 
begins, and continues counting on f 

• Hermes channel Onn 

Configuration register^ 
timing, current levels, 
channel high-band; 
control of each ~~ 
register at ocdel 
The Hermes" 
32 corresponds^tcs 
channel 11. 


the downslope 
upslope ramp 
completes. 


sor to control the 
the twelve Hermes 
is dedicated to the 
in the configuration 
Ircuits in common. 
Agisters 32..43, where 
Corresponds to Hermes 


itherlll^IiC clock signal is delayed by 
ie H17..0 bits. In normal, full speed 

, _ a zero value. If this bit is set, the 

the HiC'clock signal is used directly to latch the 

The quadrature range bit is used to select an operating range to the quadrature 
delay circuit. If the quadrature range is set to zero, the circuit will operate at a low 
frequency (below O.xxx GHz), if the quadrature range is set to one, the circuit will 
operate at a high frequency (above O.xxx GHz). 

The output termination bit is used to select whether the output circuits are 
resistively terminated. If the bit is set to a zero, the output has high impedence; if 
the bit is set to one, the output is terminated with a resistance equal to the input 
termination. At reset, this bit is set to one, terminating the output. 

The termination resistance field is used to select the impedence at which the 
Hermes channel inputs, and optionally the Hermes channel outputs are 
terminated. The resistance level is controlled relative to the setting of the resistor 
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fine tuning field of the configuration register. The interpretation of the field is 
given by the table, with units in Ohms and nominal PMOS conductance and bias 
settings: 


value 

termination resistance 

0 

Reserved 

1 

250. Ohms 

2 

125. Ohms 

3 

83.3 Ohms ^ 

4 

62.5 Ohms 


5 

50.0 Ohms 

— ~- — 

6 

41.7 Ohms 


7 

35.7 Ohms H *■> 


The output current field is used tp 
channel outputs are operated. The i 
with units in mA: 


„ Jnlch the Hermes 
I given by the table, 


value 



%:% ouWpcur 

4n|" ^ 

0 

Reserved 



1 




2 




3 

8 mA 

4 



fc^ 

5 s> 

— 


el-j^l _ 

6 

; i a: ma. 



i 7 





The output voltage swing is the product of the composite termination resistance: 
(input termination resistance;! j output tern^iaubn resistance" 1 )' 1 , and the output 
current. The output voltage wing should be set at or below 700 raV, and is 
normally set to the lowest value which permits a sufficiently low bit error rate, 
,'Jjl. h vLp< uds„upon the noisefevel in the system environment. 

The skew fields individually control the delay between the internal Hermes 
channel output clock and each of the HoC and Ho7..0 high bandwidth output 
channel signals. Each skew field contains two three-bit values, named digital skew 
and analog skew as shown below: 

5 32 0__ 

1 digital skew T analog skew 


MU 0023408 


The digital skew fields set the number of delay stages inserted in the output path 
of the HoC and the Ho7..0 high-bandwidth output channel signals. The analog 
skew fields control the power level, and thereby control the switching delay, of a 
single delay stage. Setting these fields permits a fine level of control over the 
relative skew between output channel signals. Nominal values for the output delay 
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for various values of the digital skew and analog skew fields are given below, 
assuming a nominal setting for the Hermes channel knob: 


digital 

delay (ps) 

plus 

skew 


analog 



skew 

0 

0 

no 

1 

320 

yes 

2 

400 

yes i 

3 

470 

yes 

4 

570 

yes 

I 5 

670 

yes 

6 

770 

yes 

7 

870 

yes § 



analog 
skew 

UCidy vpoy. 

0 

Reserved 

1 

??? 

2 

\ ??? 

3 

.V+40 


j*» +20 


0 


-10 


-20 


_ igital skew and 1 is 
nwgy^put delay for the HoC 
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Mnemosyne Memory 

MicroUnity's Mnemosyne memory architecture is designed for ultra-high 
bandwidth systems. The architecture integrates fast communication channels 
with SRAM caches and interfaces to standard DRAM. 

The Mnemosyne interfaces include byte-wide input and output channels intended 
to operate at rates of at least 1 GHz. These channels provide a packet 
communication link to synchronous SRAM cache on chip and%c£>ntroller for 
external banks of conventional DRAM components. Mnemosy^^^o^ides second- 
level cache and main memory for MicroUnity's Terpsichor%^y|tem architecture. 
However, Mnemosyne is useful in many memory appliq 


Mnemosyne's interface protocol embeds i 
memory space into packets conyT" 
acknowledgement. The packets incM* 
transmission errors and multiple-" ' 
operations in each device 
Mnemosyne devices may be j 
improve the bandwidth of |h%pl3 



itions to a single 
:ss, data, and 
detect single-bit 
ity. As many as eight 
te. As many as four 
and memory and to 


Mnemosyne's SRAM 
combined to providp^^Kejcoi 
Dynamically-confkuM en bio 
blocks without requiring 


storage. 

Mnemosyne's D] 
banks of stan/ 
access time, 
reading anC : s^ting 63 
to enh; 
addre: 


blocks, which are 
faffa of a fixed word size, 
elimination of faulty 
'ime-programmable 


the ^o%ecF connection of multiple 
to a Mnemosyne device. Variations in 
led p4ffcsj ; all may be accommodated by 
:gistejskffie interface supports interleaving 
^' to improve latency for localized 


Euterpe uses Mnemosyne devices as a second-level cache, main-memory 
%|p§tnsion, and optionally containing directory information. Each Mnemosyne 
device in turn supports up to four banks of DRAM, each 72 bits wide (64 bits + 
ECC). Using standard DRAM components, Terpsichore and Mnemosyne achieve 
bandwidth in excess of 9 Gbytes/sec to secondary cache and 2 Gbytes/sec to 
main memory. Terpsichore may use twice or four times the number of 
Mnemosyne devices to expand the cache and memory and to increase the 
bandwidth of the main memory system to in excess of 8 Gbytes/sec. 

MU 0023410 

Architecture Framework - 

The Mnemosyne architecture builds upon MicroUnity's Hermes high-bandwidth 
channel architecture and upon MicroUnity's Cerberus serial bus architecture, 
and complies with the requirements of Hermes and Cerberus. Mnemosyne uses 
parameters A and W as defined by Hermes. 
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